Systems and methods for spatial audio rendering

ABSTRACT

Systems and methods for rendering spatial audio in accordance with embodiments of the invention are illustrated. One embodiment includes a spatial audio system, including a primary network connected speaker, including a plurality of sets of drivers, where each set of drivers is oriented in a different direction, a processor system, memory containing an audio player application, wherein the audio player application configures the processor system to obtain an audio source stream from an audio source via the network interface, spatially encode the audio source, decode the spatially encoded audio source to obtain driver inputs for the individual drivers in the plurality of sets of drivers, where the driver inputs cause the drivers to generate directional audio.

CROSS-REFERENCE TO RELATED APPLICATIONS

The current application is a continuation of U.S. patent applicationSer. No. 17/003,957 titled “Systems and Methods for Spatial AudioRendering” filed Aug. 26, 2020, which is a continuation of U.S. patentapplication Ser. No. 16/839,021 titled “Systems and Methods for SpatialAudio Rendering” filed Apr. 2, 2020, which claims the benefit of andpriority under 35 U.S.C. § 119(e) to U.S. Provisional Patent ApplicationNo. 62/828,357 titled “System and Architecture for Spatial Audio Controland Reproduction” filed Apr. 2, 2019, U.S. Provisional PatentApplication No. 62/878,696 titled “Method and Apparatus for SpatialMultimedia Source Management” filed Jul. 25, 2019, and U.S. ProvisionalPatent Application No. 62/935,034 titled “Systems and Methods forSpatial Audio Rendering” filed Nov. 13, 2019, the disclosures of whichare hereby incorporated by reference in their entirety.

FIELD OF THE INVENTION

The present invention generally relates to spatial audio renderingtechniques, namely systems and methods for rendering spatial audio usingspatial audio reproduction techniques and/or modal beamforming speakerarrays.

BACKGROUND

Loudspeakers, colloquially “speakers,” are devices that convert anelectrical audio input signal or audio signal into a correspondingsound. Speakers are typically housed in an enclosure which may containmultiple speaker drivers. In this case, the enclosure containingmultiple individual speaker drivers may itself be referred to as aspeaker, and the individual speaker drivers inside can then be referredto as “drivers.” Drivers that output high frequency audio are oftenreferred to as “tweeters.” Drivers that output mid-range frequency audiocan be referred to as “m ids” or “mid-range drivers.” Drivers thatoutput low frequency audio can be referred to as “woofers.” Whendescribing the frequency of sound, these three bands are commonlyreferred to as “highs,” “mids,” and “lows.” In some cases, lows are alsoreferred to as “bass.”

Audio tracks are often mixed for a particular speaker arrangement. Themost basic recordings are meant for reproduction on one speaker, aformat which is now called “mono.” Mono recordings have a single audiochannel. Stereophonic audio, colloquially “stereo,” is a method of soundreproduction that creates an illusion of multi-directional audibleperspective by having a known, two speaker arrangement coupled with anaudio signal recorded and encoded for stereo reproduction. Stereoencodings contain a left channel and right channel, and assume that theideal listener is at a particular point equidistant from a left speakerand a right speaker. However, stereo provides a limited spatial effectbecause typically only two front firing speakers are used. Stereo usingfewer or greater than two loudspeakers can result in suboptimalrendering due to either down mixing or up mixing artifacts respectively.

Immersive formats now exist that require a much larger number ofspeakers and associated audio channels to try and correct thelimitations of stereo. These higher channel count formats are oftenreferred to as “surround sound.” There are many different speakerconfigurations associated with these formats such as, but not limitedto, 5.1, 7.1, 7.1.4, 10.2, 11.1, and 22.2. However, a problem with theseformats is that they require a large number of speakers to be configuredcorrectly, and to be placed in prescribed locations. If the speakers areoffset from their ideal locations, the audio rendering/reproduction candegrade significantly. In addition, systems that employ a large numberof speakers often do not utilize all of the speakers when renderingchannel-based surround sound audio encoded for fewer speakers.

SUMMARY OF THE INVENTION

Audio recording and reproduction technology has consistently striven fora higher fidelity experience. The ability to reproduce sound as if thelistener were in the room with the musicians has been a key promise thatthe industry has attempted to fulfill. However, to date, the highestfidelity spatially accurate reproductions have come at the cost of largespeaker arrays that must be arranged in a particular orientation withrespect to the ideal listener location. Systems and methods describedherein can ameliorate these problems and provide additionalfunctionality by applying spatial audio reproduction principals tospatial audio rendering.

Systems and methods for rendering spatial audio in accordance withembodiments of the invention are illustrated. One embodiment includes aspatial audio system, including a primary network connected speaker,including a plurality of sets of drivers, where each set of drivers isoriented in a different direction, a processor system, memory containingan audio player application, wherein the audio player applicationconfigures the processor system to obtain an audio source stream from anaudio source via the network interface, spatially encode the audiosource, decode the spatially encoded audio source to obtain driverinputs for the individual drivers in the plurality of sets of drivers,where the driver inputs cause the drivers to generate directional audio.

In another embodiment, the primary network connected speaker includesthree sets of drivers, where each set of drivers includes amid-frequency driver and a tweeter.

In a further embodiment, the primary network connected speaker furtherincludes three horns in a circular arrangement, where each horn is fedby a set of a mid-frequency driver and a tweeter.

In still another embodiment, the primary network connected speakerfurther includes a pair of opposing sub-woofer drivers mountedperpendicular to the circular arrangement of the three horns.

In a still further embodiment, the driver inputs cause the drivers togenerate directional audio using modal beamforming.

In yet another embodiment, the audio source is a channel-based audiosource, and the audio player application configures the processor systemto spatially encode the channel-based audio source by generating aplurality of spatial audio objects based upon the channel-based audiosource, where each spatial audio object is assigned a location and hasan associated audio signal, and encoding a spatial audio representationof the plurality of spatial audio objects.

In a yet further embodiment, the audio player application configures theprocessor system to decode the spatially encoded audio source to obtaindriver inputs for the individual drivers in the plurality of sets ofdrivers by decoding the spatial audio representation of the plurality ofspatial audio objects to obtain audio inputs for a plurality of virtualspeakers, and decode the audio input for at least one of the pluralityof virtual speakers to obtain driver inputs for the individual driversin the plurality of sets of drivers.

In another additional embodiment, the audio player applicationconfigures the processor system to decode an audio input for at leastone of the plurality of virtual speakers to obtain driver inputs for theindividual drivers in the plurality of sets of drivers by encoding aspatial audio representation of at least one of the plurality of virtualspeakers based upon the location of the primary network connectedspeaker, and decoding the spatial audio representation of at least oneof the plurality of virtual speakers to obtain driver inputs for theindividual drivers in the plurality of sets of drivers.

In a further additional embodiment, the audio player applicationconfigures the processor system to decode an audio input for at leastone of the plurality of virtual speakers to obtain driver inputs for theindividual drivers in the plurality of sets of drivers using a filterfor each set of drivers.

In another embodiment again, the audio player application configures theprocessor system to decoding the spatial audio representation of theplurality of spatial audio objects to obtain audio inputs for aplurality of virtual speakers by decoding the spatial audiorepresentation of the plurality of spatial audio objects to obtain a setof direct audio inputs for the plurality of virtual speakers, anddecoding the spatial audio representation of the plurality of spatialaudio objects to obtain a set of diffuse audio inputs for the pluralityof virtual speakers.

In a further embodiment again, the plurality of virtual speakersincludes at least 8 virtual speakers arranged in a ring.

In still yet another embodiment, the audio player application configuresthe processor system to spatially encode the audio source into at leastone spatial representation selected from the group consisting of: afirst order ambisonic representation; a higher order ambisonicrepresentation; Vector Based Amplitude Panning (VBAP) representation;Distance Based Amplitude Panning (DBAP) representation; and K NearestNeighbors Panning representation.

In a still yet further embodiment, each of the plurality of spatialaudio objects corresponds to a channel of the channel-based audiosource.

In still another additional embodiment, a number of spatial audioobjects that is greater than the number of channels of the channel-basedaudio source is obtained using upmixing of the channel-based audiosource.

In a still further additional embodiment, the plurality of spatial audioobjects includes direct spatial audio objects and diffuse spatial audioobjects.

In still another embodiment again, the audio player applicationconfigures the processor system to assign predetermined locations to theplurality of spatial audio objects based upon a layout determined by thenumber of channels of the channel-based audio source.

In a still further embodiment again, the audio player applicationconfigures the processor system to assign a location to a spatial audioobject based upon user input.

In yet another additional embodiment, the audio player applicationconfigures the processor system to assign a location to a spatial audioobject that changes over time programmatically.

In a yet further additional embodiment, the spatial audio system furtherincludes at least one secondary network connected speaker, wherein theaudio player application of the primary network connected speakerfurther configures the processor system to decode the spatially encodedaudio source to obtain a set of audio streams for each of the at leastone secondary network connected speakers based upon a layout of theprimary and at least one secondary network connected speaker, andtransmit the set of audio streams for each of the at least one secondarynetwork connected speaker to each of the at least one secondary networkconnected speaker, and each of the at least one secondary networkconnected speaker includes a plurality of sets of drivers, where eachset of drivers is oriented in a different direction, a processor system,memory containing a secondary audio player application, wherein thesecondary audio player application configures the processor system toreceive a set of audio streams from the primary network connectedspeaker, where the set of audio streams includes a separate audio streamfor each of the plurality of sets of drivers, obtain driver inputs forthe individual drivers in the plurality of sets of drivers based uponthe received set of audio streams, where the driver inputs cause thedrivers to generate directional audio.

In yet another embodiment again, each of the primary network connectedspeaker and the at least one secondary network connected speakerincludes at least one microphone, and the audio player application ofthe primary network connected speaker further configures the processorsystem to determine the layout of the primary and at least one secondarynetwork connected speaker using audio ranging.

In a yet further embodiment again, the primary network connected speakerand the at least one secondary speaker includes at least one of twonetwork connected speakers arranged in a horizontal line, three networkconnected speakers arrange as a triangle on a horizontal plane, andthree network connected speakers arrange as a triangle on a horizontalplane with a fourth network connected speaker positioned above thehorizontal plane.

In another embodiment, a network connected speaker includes three hornsin a circular arrangement, where each horn is fed by a set of amid-frequency driver and a tweeter, at least one sub-woofer drivermounted perpendicular to the circular arrangement of the three horns, aprocessor system, memory containing an audio player application, anetwork interface, wherein the audio player application configures theprocessor system to obtain an audio source stream from an audio sourcevia the network interface and generate driver inputs.

In a further embodiment, the at least one sub-woofer driver includes apair of opposing sub-woofer drivers.

In still another embodiment, the sub-woofer drivers each include adiaphragm constructed from a material comprising a triaxial carbon fiberweaver.

In a still further embodiment, the driver inputs cause the drivers togenerate directional audio using modal beamforming.

In another embodiment, a method of rendering spatial audio from an audiosource includes receiving an audio source stream from an audio source ata processor configured by an audio player application, spatiallyencoding the audio source using the processor configured by the audioplayer application, and decoding the spatially encoded audio source toobtain driver inputs for individual drivers in a plurality of sets ofdrivers using at least the processor configured by the audio playerapplication, where each of the plurality of sets of drivers is orientedin a different direction, and the driver inputs cause the drivers togenerate directional audio, and rendering spatial audio using theplurality of sets of drivers.

In a further embodiment, several of the plurality of sets of drivers arecontained within a primary network connected playback device thatincludes the processor configured by the audio player application, theremainder of the plurality of sets of drivers are contained within atleast one secondary network connected playback device, and each of theat least one secondary network connected playback device is in networkcommunication with the primary connected playback device.

In still another embodiment, decoding the spatially encoded audio sourceto obtain driver inputs for individual drivers in a plurality of sets ofdrivers further includes decoding the spatially encoded audio source toobtain driver inputs for individual drivers of the primary networkconnected playback device using the processor configured by the audioplayer application, decoding the spatially encoded audio source toobtain audio streams for each of the sets of drivers of each of the atleast one secondary network connected playback device using theprocessor configured by the audio player application, transmitting theset of audio streams for each of the at least one secondary networkconnected speaker to each of the at least one secondary networkconnected speaker, and each of the at least one secondary networkconnected speaker generating driver inputs for its individual driversbased upon a received set of audio streams.

In a still further embodiment, the audio source is a channel-based audiosource, and spatially encoding the audio source further includesgenerating a plurality of spatial audio objects based upon thechannel-based audio source, where each spatial audio object is assigneda location and has an associated audio signal, and encoding a spatialaudio representation of the plurality of spatial audio objects.

In yet another embodiment, decoding the spatially encoded audio sourceto obtain driver inputs for individual drivers in a plurality of sets ofdrivers further includes decoding the spatial audio representation ofthe plurality of spatial audio objects to obtain audio inputs for aplurality of virtual speakers, and decoding the audio inputs of theplurality of virtual speakers to obtain driver inputs for the individualdrivers in the plurality of sets of drivers.

In a yet further embodiment, decoding the audio inputs of the pluralityof virtual speakers to obtain driver inputs for the individual driversin the plurality of sets of drivers further includes encoding a spatialaudio representation of at least one of the plurality of virtualspeakers based upon the location of the primary network connectedspeaker, and decoding the spatial audio representation of at least oneof the plurality of virtual speakers to obtain driver inputs for theindividual drivers in the plurality of sets of drivers.

In another additional embodiment, decoding the audio inputs of theplurality of virtual speakers to obtain driver inputs for the individualdrivers in the plurality of sets of drivers further includes using afilter for each set of drivers.

In a further additional embodiment, decoding the spatial audiorepresentation of the plurality of spatial audio objects to obtain audioinputs for a plurality of virtual speakers further includes decoding thespatial audio representation of the plurality of spatial audio objectsto obtain a set of direct audio inputs for the plurality of virtualspeakers, and decoding the spatial audio representation of the pluralityof spatial audio objects to obtain a set of diffuse audio inputs for theplurality of virtual speakers.

In another embodiment again, the plurality of virtual speakers includesat least 8 virtual speakers arranged in a ring.

In a further embodiment again, spatially encoding the audio sourceincludes spatially encoding the audio source into at least one spatialrepresentation selected from the group consisting of a first orderambisonic representation, a higher order ambisonic representation,Vector Based Amplitude Panning (VBAP) representation, Distance BasedAmplitude Panning (DBAP) representation, and K Nearest Neighbors Panningrepresentation.

In another embodiment, a spatial audio system includes a primary networkconnected speaker configured to obtain an audio stream comprising atleast one audio signal, obtain location data describing the physicallocation of the primary network connected speaker, transform the atleast one audio signal into a spatial representation, transform thespatial representation based on a virtual speaker layout, generate aseparate audio signal for each horn of the primary network connectedspeaker, and playback the separate audio signals corresponding to thehorns of the primary network connected speaker using at least one driverfor each horn.

In a further embodiment, the spatial audio system further includes atleast one secondary network connected speaker, and the primary networkconnected speaker is further configured to obtain location datadescribing the physical location of the at least one secondary networkconnected speaker, generate a separate audio signal for each horn of theat least one secondary network connected speaker, and transmit theseparate audio signals to the at least one secondary network connectedspeaker associated with the horn for each separate audio signal.

In still another embodiment, the primary network connected speaker is asuper primary network connected speaker, and the super primary networkconnected speaker is further configured to transmit the audio stream toa second primary network connected speaker.

In a still further embodiment, the primary network connected speaker iscapable of establishing a wireless network joinable by other networkconnected speakers.

In yet another embodiment, the primary network connected speaker iscontrollable by a control device.

In a yet further embodiment, the control device is a smart phone.

In another additional embodiment, the primary network connected speakeris capable of generating a mel spectrogram of the audio signal, andtransmitting the mel spectrogram as metadata to a visualization devicefor use in visualizing the audio signal as a visualization helix.

In a further additional embodiment, the generated separate audio signalscan be used to directly drive a driver.

In another embodiment again, the virtual speaker layout includes a ringof virtual speakers.

In a further embodiment again, the ring of virtual speakers includes atleast eight virtual speakers.

In still yet another embodiment, virtual speakers in the virtual speakerlayout are regularly spaced.

In another embodiment, a spatial audio system includes a first networkconnected speaker at a first location, and a second network connectedspeaker at a second location, where the first network connected speakerand the second network connected speaker are configured to synchronouslyrender audio signals such that at least one sound object is rendered ata location different than the first location and the second locationbased on driver signals generated by the first modal beamformingspeaker.

In a further embodiment, the spatial audio system further includes athird network connected speaker at a third location configured tosynchronously render audio signals with the first and second networkconnected speakers.

In still another embodiment, the spatial audio system further includes afourth network connected speaker at a fourth location configured tosynchronously render audio signals with the first, second, and thirdnetwork connected speakers, and the fourth location is at a higheraltitude than the first, second, and third locations.

In a still further embodiment, the first, second, third and fourthlocations are all within a room, and the fourth modal beamformingspeaker is connected to a ceiling of the room.

In another embodiment, a spatial audio system includes a primary networkconnected speaker capable of obtaining an audio stream comprising atleast one audio signal obtaining location data describing the physicallocation of the primary network connected speaker, transforming the atleast one audio signal into a spatial representation, transforming thespatial representation based on a virtual speaker layout, generating aseparate primary audio signal for each horn of the primary networkconnected speaker, generating a separate secondary audio signal for eachhorn of a plurality of secondary network connected speakers,transmitting each separate secondary audio signal to the secondarynetwork connected speaker comprising the respective horn, and playingback the primary separate audio signals corresponding to the horns ofthe primary network connected speaker using at least one driver for eachhorn in a synchronized fashion with the plurality of secondary networkconnected speakers.

In another embodiment, a method of rendering spatial audio includesobtaining an audio signal encoded in a first format using a primarynetwork connected speaker, transforming the audio signal into a spatialrepresentation using the primary network connected speaker, generating aplurality of driver signals based on the spatial representation usingthe primary network connected speaker, where each driver signalcorresponds to at least one driver coupled with a horn, and renderingspatial audio using the plurality of driver signals and thecorresponding at least one driver.

In a further embodiment, the method further includes transmitting aportion of the plurality of driver signals to at least one secondarynetwork connected speaker, and rendering the spatial audio using theprimary network connected speaker and the at least one secondary networkconnected speaker in a synchronized fashion.

In still another embodiment, the method further includes generating amel spectrogram of the audio signal, and transmitting the melspectrogram as metadata to a visualization device for use in visualizingthe audio signal as a visualization helix.

In a still further embodiment, the generating of the plurality of driversignals is based on a virtual speaker layout.

In yet another embodiment, the virtual speaker layout includes a ring ofvirtual speakers.

In a yet further embodiment, the ring of virtual speakers includes atleast eight virtual speakers.

In another additional embodiment, virtual speakers in the virtualspeaker layout are regularly spaced.

In a further additional embodiment, the primary network connectedspeaker is a super primary network connected speaker, and the methodfurther includes transmitting the audio signal to a second primarynetwork connected speaker, transforming the audio signal into a secondspatial representation using the second primary network connectedspeaker, generating a second plurality of driver signals based on thesecond spatial representation using the second primary network connectedspeaker, where each driver signal corresponds to at least one drivercoupled with a horn, and rendering spatial audio using the plurality ofdriver signals and the corresponding at least one driver.

In another embodiment again, the second spatial representation isidentical to the first spatial representation.

In a further embodiment again, generating a plurality of driver signalsbased on the spatial representation further includes using a virtualspeaker layout.

In still yet another embodiment, the virtual speaker layout includes aring of virtual speakers.

In a still yet further embodiment, the ring of virtual speakers includesat least eight virtual speakers.

In still another additional embodiment, virtual speakers in the virtualspeaker layout are regularly spaced.

In another embodiment, a network connected speaker includes a pluralityof horns, where each of the three horns is fitted with a plurality ofdrivers, and a pair of opposing, coaxial woofers, where the threepluralities of drivers are capable of rendering spatial audio.

In a further embodiment, each plurality of drivers includes a tweeterand a mid.

In still another embodiment, the tweeter and mid are configured to becoaxial and to fire in the same direction.

In a still further embodiment, the tweeter is located over the midrelative to the center of the modal beamforming speaker.

In yet another embodiment, one of the pair of woofers includes a channelthrough the center of the woofer.

In a yet further embodiment, the woofers include diaphragms that areconstructed from a triaxial carbon fiber weave.

In another additional embodiment, the plurality of horns are coplanar,and wherein a first woofer in the pair of woofers is configured to fireperpendicularly to the plane of horns in a positive direction, and asecond woofer in the pair of woofers is configured to fireperpendicularly to the plane of horns in a negative direction.

In a further additional embodiment, the plurality of horns areconfigured in a ring.

In another embodiment again, the plurality of horns includes threehorns.

In a further embodiment again, the plurality of horns are regularlyspaced.

In still yet another embodiment, the horns form a single component.

In a still yet further embodiment, the plurality of horns forms a sealbetween two covers.

In still another additional embodiment, at least one back volume for theplurality of drivers is contained between the three horns.

In a still further additional embodiment, the network connected speakerfurther includes a stem configured to be connected to a stand.

In still another embodiment again, the stem and stand are configured tobe connected using a bayonet locking system.

In a still further embodiment again, the stem includes a ring capable ofproviding playback control signals to the network connected speaker.

In yet another additional embodiment, the network connected speaker isconfigured to be hung from a ceiling.

In another embodiment, a horn array for a loudspeaker includes a unibodyring molded such that the ring forms a plurality of horns whilemaintaining radial symmetry.

In a further embodiment, the horn array is manufactured using 3-Dprinting.

In still another embodiment, the plurality of horns includes 3 hornsoffset at 120 degrees.

In another embodiment, an audio visualization method includes obtainingan audio signal, generating a mel spectrogram from the audio signal,plotting the mel spectrogram on a helix, such that the point on eachturn of the helix offset by one pitch reflect the same musical note intheir respective octave, and warping the helix structure based onamplitude such that the volume of each note is visualized by an outwardbending of the helix.

In a further embodiment, the helix is visualized from a from above.

In still another embodiment, the helix is colored.

In a still further embodiment, each turn of the helix is colored using arange of colors which is repeated for each turn of the helix.

In yet another embodiment, the color saturation decreases for each turnof the helix.

In a yet further embodiment, the color transparency decreases for eachturn of the helix.

In another additional embodiment, the helix structure leaves a trailtowards the axis of the helix when warped.

In another embodiment, a method of constructing a network connectedspeaker includes constructing a plurality of outward facing horns in aring, fitting a plurality of drivers to each outward facing horn, andfitting a coaxial pair of opposite facing woofers such that one wooferis above the ring and one woofer is below the ring.

In a further embodiment, constructing a plurality of outward facinghorns in a ring further includes fabricating the plurality of outwardfacing horns as a single component.

In still another embodiment, the plurality of outward facing horns areconstructed using additive manufacturing.

In a still further embodiment, the construction method further includesplacing a rod through the center of a diaphragm of one of the woofers.

In yet another embodiment, a woofer is constructed with a doublesurround to accommodate a rod through the center of a diaphragm on thewoofer.

In a yet further embodiment, each woofer includes a diaphragm made of atriaxial carbon fiber weave.

In another additional embodiment, the construction method furtherincludes fitting a first cover over the top of the ring and fitting asecond cover over the bottom of the ring such that the plurality ofdrivers are within a volume created by the ring, the first cover, andthe second cover.

In a further additional embodiment, each horn is associated with aunique tweeter and a unique mid in the plurality of drivers.

In another embodiment again, the construction method further includesplacing at least one microphone between each horn on the ring.

Additional embodiments and features are set forth in part in thedescription that follows, and in part will become apparent to thoseskilled in the art upon examination of the specification or may belearned by the practice of the invention. A further understanding of thenature and advantages of the present invention may be realized byreference to the remaining portions of the specification and thedrawings, which forms a part of this disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The description and claims will be more fully understood with referenceto the following figures and data graphs, which are presented asexemplary embodiments of the invention and should not be construed as acomplete recitation of the scope of the invention.

FIG. 1A is an example system diagram fora spatial audio system inaccordance with an embodiment of the invention.

FIG. 1B is an example system diagram fora spatial audio system inaccordance with an embodiment of the invention.

FIG. 1C is an example system diagram for a spatial audio systemincluding a source input device in accordance with an embodiment of theinvention.

FIG. 2A is an example room layout for a spatial audio system inaccordance with an embodiment of the invention.

FIGS. 2B-2F illustrate exemplary first order ambisonics around a cell inthe example room layout of FIG. 2A in accordance with an embodiment ofthe invention.

FIG. 2G illustrate exemplary second order ambisonics around a cell inthe example room layout of FIG. 2A in accordance with an embodiment ofthe invention.

FIG. 3A illustrates an example room layout for a spatial audio system inaccordance with an embodiment of the invention.

FIG. 3B illustrates exemplary first order ambisonics around the cells inthe example room layout of FIG. 3A in accordance with an embodiment ofthe invention.

FIG. 4A illustrates an example room layout for a spatial audio system inaccordance with an embodiment of the invention.

FIG. 4B illustrates exemplary first order ambisonics around the cells inthe example room layout of FIG. 4A in accordance with an embodiment ofthe invention.

FIG. 5A illustrates an example room layout for a spatial audio system inaccordance with an embodiment of the invention.

FIG. 5B illustrates exemplary first order ambisonics around the cells inthe example room layout of FIG. 5A in accordance with an embodiment ofthe invention.

FIG. 6A illustrates an example room layout for a spatial audio system inaccordance with an embodiment of the invention.

FIG. 6B illustrates exemplary first order ambisonics around the cells inthe example room layout of FIG. 6A in accordance with an embodiment ofthe invention.

FIG. 7A illustrates an example room layout for a spatial audio system inaccordance with an embodiment of the invention.

FIG. 7B illustrates exemplary first order ambisonics around the cells inthe example room layout of FIG. 7A in accordance with an embodiment ofthe invention.

FIG. 8A illustrates an example home containing cells in accordance withan embodiment of the invention.

FIG. 8B illustrates the example home organized into various groups inaccordance with an embodiment of the invention.

FIG. 8C illustrates the example home organized into various zones inaccordance with an embodiment of the invention.

FIG. 8D illustrates the example home containing cells in accordance withan embodiment of the invention.

FIG. 9 illustrates a spatial audio system in accordance with anembodiment of the invention.

FIG. 10 illustrates a process for rendering sound fields using a spatialaudio system in accordance with an embodiment of the invention

FIG. 11 illustrates a process for spatial audio control and reproductionin accordance with an embodiment of the invention.

FIG. 12A-12D illustrate relative positions of sound objects within asystem encoder and a speaker node encoder in accordance with anembodiment of the invention.

FIG. 13A-13D visually illustrate an example process for mapping 5.1channel audio to three cells in accordance with an embodiment of theinvention.

FIG. 14 illustrates a process for processing sound information inaccordance with an embodiment of the invention.

FIG. 15 illustrates sets of drivers in a driver array of a cell inaccordance with an embodiment of the invention.

FIG. 16 illustrates a process for rendering spatial audio in a diffuseand a directed fashion in accordance with an embodiment of theinvention.

FIG. 17 is a process for propagating virtual speaker placements to cellsin accordance with an embodiment of the invention.

FIG. 18A is illustrates a cell in accordance with an embodiment of theinvention.

FIG. 18B is a render of a halo of a cell in accordance with anembodiment of the invention.

FIG. 18C is a cross section of the halo in accordance with an embodimentof the invention.

FIG. 18D illustrates an exploded view of a coaxial alignment of driversfor a single horn of a halo in accordance with an embodiment of theinvention.

FIG. 18E illustrates a socketed set of drivers for each horn in a haloin accordance with an embodiment of the invention.

FIG. 18F is a horizontal cross section of the halo in accordance with anembodiment of the invention.

FIG. 18G illustrates a circuit board annulus and bottom portion of thehousing of a core of a cell in accordance with an embodiment of theinvention.

FIG. 18H is an illustration of a halo and core in accordance with anembodiment of the invention.

FIG. 18I is an illustration of a halo, core, and crown in accordancewith an embodiment of the invention.

FIG. 18J is an illustration of a halo, core, crown, and lungs inaccordance with an embodiment of the invention.

FIGS. 18K and 18L illustrate opposing woofers in accordance with anembodiment of the invention.

FIGS. 18M and 18N are a cross section of the opposing woofers inaccordance with an embodiment of the invention.

FIG. 18O illustrates a cell with a stem in accordance with an embodimentof the invention.

FIG. 18P illustrates an example connector on the bottom of a stem inaccordance with an embodiment of the invention.

FIG. 18Q is a cross section of a cell in accordance with an embodimentof the invention.

FIG. 18R is an exploded view of a cell in accordance with an embodimentof the invention.

FIG. 19A-19D illustrates a cell on several stand variants in accordancewith embodiments of the invention.

FIG. 20 illustrates a control ring on a stem in accordance with anembodiment of the invention.

FIG. 21 is a cross section of a stem and control ring in accordance withan embodiment of the invention.

FIG. 22 is an illustration of a control ring rotation in accordance withan embodiment of the invention.

FIG. 23 is a close view of a portion of the control ring mechanism fordetecting rotation in accordance with an embodiment of the invention.

FIG. 24 is an illustration of a control ring click in accordance with anembodiment of the invention.

FIG. 25 is a close view of a portion of the control ring mechanism fordetecting clicks in accordance with an embodiment of the invention.

FIG. 26 is an illustration of a control ring vertical movement inaccordance with an embodiment of the invention.

FIG. 27 is a close view of a portion of the control ring mechanism fordetecting vertical movement in accordance with an embodiment of theinvention.

FIG. 28 is a close view of a portion of the control ring mechanism fordetecting rotation on a secondary plane in accordance with an embodimentof the invention.

FIG. 29 visually illustrates a process for locking a stem to a standusing a bayonet based locking system in accordance with an embodiment ofthe invention.

FIG. 30 is a cross section of a bayonet based locking system inaccordance with an embodiment of the invention.

FIGS. 31A and 31B illustrate a locked and unlocked position for abayonet based locking system in accordance with an embodiment of theinvention.

FIG. 32 is a block diagram illustrating cell circuitry in accordancewith an embodiment of the invention.

FIG. 33 illustrates an example hardware implementation of a cell inaccordance with an embodiment of the invention.

FIG. 34 illustrates a source manager in accordance with an embodiment ofthe invention.

FIG. 35 illustrates a position manager in accordance with an embodimentof the invention.

FIG. 36 illustrates an example UI for controlling the placement of soundobjects in a space in accordance with an embodiment of the invention.

FIGS. 37A and 37B illustrate an example UI for controlling the placementof and splitting sound objects in a space in accordance with anembodiment of the invention.

FIG. 38 illustrates an example UI for controlling volume and renderingof sound objects in accordance with an embodiment of the invention.

FIG. 39 illustrates a sound object in an augmented reality environmentin accordance with an embodiment of the invention.

FIG. 40 illustrates sound objects in an augmented reality environment inaccordance with an embodiment of the invention.

FIG. 41 illustrates an example UI for configuration operations inaccordance with an embodiment of the invention.

FIG. 42 illustrates an example UI for an integrated digital instrumentin accordance with an embodiment of the invention.

FIG. 43 illustrates an example UI for managing wave pinning inaccordance with an embodiment of the invention.

FIG. 44 illustrates a series of UI screens for tracking the movement ofsound objects in accordance with an embodiment of the invention.

FIG. 45 conceptually illustrates audio objects in a space to create thesensation of stereo everywhere in accordance with an embodiment of theinvention.

FIG. 46 conceptually illustrates placing audio objects relative to avirtual stage in accordance with an embodiment of the invention.

FIG. 47 conceptually illustrates placing audio objects in 3D space inaccordance with an embodiment of the invention.

FIG. 48 conceptually illustrates software of a cell that can beconfigured to act as a primary cell or a secondary cell in accordancewith an embodiment of the invention.

FIG. 49 conceptually illustrates a sound server software implementationin accordance with an embodiment of the invention.

FIG. 50 illustrates a spatial encoder that can be utilized to encode amono source in accordance with an embodiment of the invention.

FIG. 51 illustrates a source encoder in accordance with an embodiment ofthe invention.

FIG. 52 is a graph showing generation of individual driver feeds basedupon three audio signals corresponding to feeds for each of a set ofthree horns in accordance with an embodiment of the invention.

FIG. 53 illustrates audio data distribution in a hierarchy with a superprimary cell in accordance with an embodiment of the invention.

FIG. 54 illustrates audio data distribution in a hierarchy with twosuper primary cells in accordance with an embodiment of the invention.

FIG. 55 illustrates audio data distribution in a hierarchy with a superprimary cell with communication between cells over a Wi-Fi router inaccordance with an embodiment of the invention.

FIG. 56 illustrates audio data distribution in a hierarchy without superprimary cells in accordance with an embodiment of the invention.

FIG. 57 is a flow chart for a primary cell election process inaccordance with an embodiment of the invention.

FIGS. 58A and 58B illustrate a visualization helix from a side and topperspective, respectively, in accordance with an embodiment of theinvention.

FIG. 59 illustrates a helix based visualization in accordance with anembodiment of the invention.

FIG. 60 illustrates four helix based visualizations for different tracksin an audio stream in accordance with an embodiment of the invention.

DETAILED DESCRIPTION

Turning now to the drawings, systems and methods for spatial audiorendering are illustrated. Spatial audio systems in accordance with manyembodiments of the invention include one or more network connectedspeakers that can be referred to as “cells”. In several embodiments, thespatial audio system is able to receive an arbitrary audio source as aninput and render spatial audio in a manner determined based upon thespecific number and placement of cells in a space. In this way, audiosources that are encoded assuming a specific number and/or placement ofspeakers (e.g. channel-based surround sound audio formats) can bere-encoded so that the audio reproduction is decoupled from speakerlayout. The re-encoded audio can then be rendered in a manner that isspecific to the particular number and layout of cells available to thespatial audio system to render the sound field. In a number ofembodiments, the quality of the spatial audio is enhanced through theuse of directional audio via active directivity control. In manyembodiments, spatial audio systems employ cells that include arrays ofdrivers that enable the generation of directional audio using techniquesincluding (but not limited to) modal beamforming. In this way, spatialaudio systems that can render a variety of spatial audio formats can beconstructed using only a single cell and enhanced (potentially due tothe acquisition over time) with additional cells.

As noted above, a limitation of typical channel-based surround soundaudio systems is the requirement for a specific number of speakers andprescribed placement of those speakers. Spatial audio reproductiontechniques such as (but not limited to) ambisonic techniques, VectorBased Amplitude Panning (VBAP) techniques, Distance Based AmplitudePanning (DBAP) techniques, and k-Nearest-Neighbors panning (KNN panning)techniques were developed to provide a speaker-layout independent audioformat that could address the limitations of channel based audio. Theuse of ambisonics as a sound field reproduction technique was initiallydescribed in Gerzon, M. A., 1973. Periphony: With-height soundreproduction. Journal of the Audio Engineering Society, 21(1), pp. 2-10.Ambisonics enable representation of sound fields using sphericalharmonics. First order ambisonics refers to the representation of asound field using first order spherical harmonics. The set of signalsgenerated by a typical first-order ambisonic encoding are often referredto as “B-format” signals and include components labelled W for the soundpressure at a particular origin location, X for the front-minus-backsound pressure gradient, Y for the left-minus-right sound pressuregradient, and Z for the up-minus-down sound pressure gradient. A keyfeature of the B-format is that it is a speaker-independentrepresentation of a sound field. Ambisonic encodings are characterizedin that they reflect source directions in a manner that is independentof speaker placement.

Conventional spatial audio reproduction systems are generally limited bysimilar constraints as channel-based surround sound audio systems inthat these spatial audio reproduction systems often require a largenumber of speakers with specific speaker placements. For example,rendering of spatial audio from an ambisonic representation of a soundfield ideally involves the use of a group of loudspeakers arrangeduniformly around the listener on a circle or on the surface of a sphere.When speakers are placed in this manner, an ambisonic decoder cangenerate audio input signals for each speaker that will recreate thedesired sound field using a linear combination of the B-format signals.

Systems and methods in accordance with many embodiments of the inventionenable the generation of sound fields using an arbitrary number and/orplacement of cells by encoding one or more audio sources into a spatialaudio representation such as (but not limited to) an ambisonicrepresentation, a VBAP representation, a VBAP representation, a DBAPrepresentation and/or a kNN panning representation. In severalembodiments, the spatial audio system decodes an audio source in amanner that creates a number of spatial audio objects. Where the audiosource is a channel-based audio source, each channel can be assigned toa spatial audio object that is placed by the spatial audio system in adesired surround sound speaker layout. When the audio source is a set ofmaster recordings, then the spatial audio system can assign each trackwith a separate spatial audio object that can be placed in 3D spacebased upon a band performance layout template. In many embodiments, theuser can modify the placement of the spatial audio objects through anyof a number of user input modalities. Once the placement of the audioobjects is determined, a spatial encoding (e.g. an ambisonic encoding)of the audio objects can be created.

In various embodiments, spatial audio systems employ a hierarchy ofprimary cells and secondary cells. In many embodiments, primary cellsare responsible for generating spatial encodings and subsequentlydecoding the spatial audio into a separate stream (or set of streams)for secondary cells that it governs. To do this, primary cells can usean audio source to obtain a set of spatial audio objects and then canobtain a spatial representation of the audio object, and then decode thespatial representation of each audio object based upon a layout ofcells. The primary cell can then re-encode the information based on thelocation and orientation of each secondary cell that it governs, and canunicast the encoded audio streams to their respective secondary cells.The secondary cells in turn can render their received audio stream togenerate driver inputs.

In a number of embodiments, the spatial encodings are performed within anested architecture involving encoding the spatial objects intoambisonic representations. In many embodiments, the spatial encodingsperformed within the nested architecture utilize higher order ambisonics(e.g. sound field representation), a VBAP representation, a DBAPrepresentation and/or a kNN panning representation. As can readily beappreciated, any of a variety of spatial audio encoding techniques canbe utilized within a nested architecture as appropriate to therequirements of specific applications in accordance with variousembodiments of the invention. Furthermore, the specific manner in whichspatial representations of audio objects are decoded to provide audiosignals to individual cells can depend upon factors including (but notlimited to) the number of audio objects, the number of virtual speakers(where the nested architecture utilizes virtual speakers) and/or thenumber of cells.

In several embodiments, the spatial audio system can determine thespatial relationships between the cells using a variety of rangingtechniques including (but not limited to) acoustic ranging and visualmapping using a camera that is part of a user device that cancommunicate with the spatial audio system. In many embodiments, thecells include microphone arrays and can determine both orientation andspacing. Once the spatial relationship between the cells is known,spatial audio systems in accordance with a number of embodiments of theinvention can utilize the cell layout to configure its nested encodingarchitecture. In numerous embodiments, cells can map their physicalenvironment which can further be used in the encoding and/or decoding ofspatial audio. For example, cells can generate room impulse responses tomap their environment. For example, the room impulse responses could beused to find the distance to walls, floor, and/or ceiling as well as toidentify and/or correct the acoustical problems created by the room. Ascan readily be appreciated, any of a variety of techniques can beutilized to generate room impulses responses and/or map environments foruse in spatial audio rendering as appropriate to the requirements ofspecific applications in accordance with various embodiments of theinvention.

As noted above, spatial audio systems can employ cells that utilizetechniques including (but not limited to) modal beamforming to generatedirectional audio. In many embodiments, a primary cell can utilizeinformation concerning the spatial relationships between itself and itsgoverned secondary cells, to generate audio streams designed forplayback on each specific cell. The primary cell can unicast a separateaudio stream for each set of drivers of each secondary cell that itgoverns in order to coordinate spatial audio playback. As can beappreciated, the number of transmitted channels can be modified based onthe number of drivers and horns of a cell (e.g. 3.1, 5, etc.). Given thespatial control of the audio, any number of different conventionalsurround sound speaker layouts (or indeed any arbitrary speaker layout)can be rendered using a number of cells that is significantly smallerthan the number of conventional speakers that would be required toproduce a similar sound field using conventional spatial audiorendering. Furthermore, upmixing and/or downmixing of channels of anaudio source can be utilized to render a number of audio objects thatmay be different than the number of source channels.

In a variety of embodiments, cells can be utilized to provide theauditory sensation of being “immersed” in sound, for example, as if theuser was at the focal point of a stereo audio system regardless of theirlocation relative to the cells. In many embodiments, the sound fieldproduced by the spatial audio system can be enhanced to spread soundenergy more evenly within a space through the use of cells that arecapable of rendering diffuse sound. In a number of embodiments, cellscan generate diffuse audio by rendering directional audio in a way thatcontrols the perceived ratio of direct to reverberant sound. As canreadily be appreciated the specific manner in which spatial audiosystems generate diffuse audio can be dependent upon the room acousticsof the space occupied by the spatial audio system and the requirementsof a specific application.

In a number of embodiments, cells that can generate spatial audioinclude arrays of drivers. In many embodiments, an array of drivers isdistributed around a horizontal ring. In several embodiments, the cellcan also include additional drivers such as (but not limited to) twoopposite facing woofers oriented on a vertical axis. In certainembodiments, a horizontal ring of drivers can include three sets ofhorizontally aligned drivers, where each set is includes a mid driverand a tweeter, referred to herein as a “halo.” In several embodiments,each set of a mid driver and a tweeter feeds a horn and a circular hornarrangement can be used to enhance directionality. While the particularform of the horns can be subject to the particular drivers used, thehorn structure is referred to herein as a “halo”. In many embodiments,this driver arrangement in combination with the halo can enable audiobeam steering using modal beamforming. As can readily be appreciated anyof a variety of cells can be utilized within spatial audio systems inaccordance with various embodiments of the invention including cellshaving different numbers and types of drivers, cells having differentplacement of drivers such as (but not limited to) a tetrahedralconfiguration of drivers, cells that are capable of both horizontal andvertical beamforming, and/or cells that are incapable of producingdirectional audio.

Indeed, many embodiments of the invention include cells that do notinclude a woofer, mid driver, and/or tweeter. In various embodiments, asmaller form factor cell can be packaged to fit into a lightbulb socket.In numerous embodiments, larger cells with multiple halos can beconstructed. Primary cells can negotiate generating audio streams forsecondary cells that have different acoustic properties and/ordriver/horn configurations. For example, a larger cell with two halosmay need 6 channels of audio.

In addition, spatial audio systems in accordance with variousembodiments of the invention can be implemented in any of a variety ofenvironments including (but not limited to) indoor spaces, outdoorspaces, and the interior of vehicles such as (but not limited to)passenger automobiles. In several embodiments, the spatial audio systemcan be utilized as a composition tool and/or a performance instrument.As can readily be appreciated, the construction, placement, and/or useof spatial audio systems in accordance with many embodiments of theinvention can be determined based upon the requirements of a specificapplication.

In order to do away with cumbersome wiring requirements, in numerousembodiments, cells are capable of wireless communication with othercells in order to coordinate rendering of sound fields. While media canbe obtained from local sources, in a variety of embodiments, cells arecapable of connecting to networks to obtain media content and otherrelevant data. In many embodiments, a network connected source inputdevice can be used to directly connect to devices that provide mediacontent for playback. Further, cells can create their own networks toreduce traffic-based latency during communication. In order to establisha network, cells can establish a hierarchy amongst themselves in orderto streamline communication and processing tasks.

When a spatial audio system includes a single cell that can generatedirectional audio, the encoding and decoding processes associated withthe nested architecture of the spatial audio system that produce audioinputs for the cell's drivers can be performed by the processing systemof the single cell. When a spatial audio system utilizes multiple cellsto produce a sound field, the processing associated with decoding one ormore audio sources, spatially encoding the decoded audio source(s), anddecoding the spatial audio and re-encoding it for each cell in an areais typically handled by a primary cell. The primary cell can thenunicast the individual audio signals to each governed secondary cell. Ina number of embodiments, a cell can act as a super primary cellcoordinating synchronized playback of audio sources by multiple sets ofcells that each include a primary cell.

However, in some embodiments, the primary cell provides audio signalsfor virtual speakers to governed secondary cells and spatial layoutmetadata to one or more secondary cells. In several embodiments, thespatial layout metadata can include information including (but notlimited to) spatial relationships between cells, spatial relationshipsbetween cells and one or more audio objects, spatial relationshipsbetween one or more cells and one or more virtual speaker locations,and/or information regarding room acoustics. As can readily beappreciated, the specific spatial layout metadata provided by theprimary cell is largely determined by the requirements of specificspatial audio system implementations. The processing system of asecondary cell can use the received audio signals and the spatial layoutmetadata to produce audio inputs for the secondary cell's drivers.

In many embodiments, rendering of sound fields by spatial audio systemscan be controlled using any of a number of different input modalitiesincluding touch interfaces on individual cells, voice commands detectedby one or more microphones incorporated within a cell and/or anotherdevice configured to communicate with the spatial audio system, and/orapplication software executing on a mobile device, personal computer,and/or other form of consumer electronic device. In many embodiments,the user interfaces enable selection of audio sources and identificationof cells utilized to render a sound field from the selected audiosource(s). User interfaces provided by spatial audio systems inaccordance with many embodiments of the invention can also enable a userto control placement of spatial audio objects. For example, a userinterface can be provided on a mobile device that enables the user toplace audio channels from a channel-based surround sound audio sourcewithin a space. In another example, the user interface may enableplacement of audio objects corresponding to different musicians and/orinstruments within a space.

The ability of spatial audio systems in accordance with many embodimentsof the invention to enable audio objects to be moved within a spaceenables the spatial audio system to render a sound field in a mannerthat tracks a user. By way of example, audio can be rendered in a mannerthat tracks the head pose of a user wearing a virtual reality, mixedreality, or augmented reality headset. In addition, spatial audio can berendered in a manner that tracks the orientation of a tablet computerbeing used to view video content. In many embodiments, movement ofspatial audio objects is achieved by panning a spatial representation ofthe audio source generated by the spatial audio system in a manner thatis dependent upon a tracked user/object. As can readily be appreciated,the simplicity with which a spatial audio system can move audio objectscan enable a host of immersive audio experiences for users. Indeed,audio objects can further be associated with visualizations thatdirectly reflect the audio signal. Further, audio objects can be placedin a virtual “sound space” and assigned characters, objects, orintelligence to create an interactive scene that gets rendered as asound field. Primary cells can process audio signals to provide metadatafor use in visualization to a device used to provide the visualization.

While many features of spatial audio systems and the cells that can beutilized to implement them are introduced above, the followingdiscussion provides an in-depth exploration of manner in which spatialaudio systems can be implemented and the processes they can utilize torender sound fields from a variety of audio sources using an arbitrarynumber and placement of cells. Much of the discussion that followsreferences the use of ambisonic representations of audio objects in thegeneration of sound fields by spatial audio systems. However, spatialaudio systems should be understood as not being limited to use ofambisonic representations. Ambisonic representations are describedsimply as an example of a spatial audio representation that can beutilized within a spatial audio system in accordance with manyembodiments of the invention. It should be appreciated that any of avariety of spatial audio representations can be utilized in thegeneration of sound fields using spatial audio systems implemented inaccordance with various embodiments of the invention including (but notlimited to) VBAP representations, DBAP representations, and/or higherambisonic representations (e.g. sound field representations).

SECTION 1: Spatial Audio Systems

Spatial audio systems are systems that utilize arrangements of one ormore cells to render spatial audio for a given space. Cells can beplaced in any of a variety of arbitrary arrangements in any number ofdifferent spaces, including (but not limited to) indoor and outdoorspaces. While some cell arrangements are more advantageous than others,spatial audio systems described herein can function with high fidelitydespite imperfect cell placement. In addition, spatial audio systems inaccordance with many embodiments of the invention can render spatialaudio using a particular cell arrangement despite the fact that thenumber and/or placement of cells may not correspond with assumptionsconcerning the number and placement of speakers utilized in the encodingof the original audio source. In many embodiments, cells can map theirsurroundings and/or determine their relative positions to each other inorder to configure their playback to accommodate for imperfectplacement. In numerous embodiments, cells can communicate wirelessly,and, in many embodiments, create their own ad hoc wireless networks. Invarious embodiments, cells can connect to external systems to acquireaudio for playback. Connections to external systems can also be used forany number of alternative functions, including, but not limited to,controlling internet of things (IoT) devices, access digital assistants,playback control devices, and/or any other functionality as appropriateto the requirements of specific applications in accordance with variousembodiments of the invention.

An example spatial audio system in accordance with an embodiment of theinvention is illustrated in FIG. 1A. Spatial audio system 100 includes aset of cells 110. The set of cells in the illustrated embodimentincludes a primary cell 112, and secondary cells 114. However, in manyembodiments, the number of “primary” and “secondary” cells is dynamicand depends on the current number of cells added to the system and/orthe manner in which the user has configured the spatial audio system. Inmany embodiments, a primary cell connects to a network 120 to connect toother devices. In numerous embodiments, the network is the internet, andthe connection is facilitated via a router. In some embodiments, a cellcontains a router and the capability to directly connect to the internetvia a wired and/or wireless port. Primary cells can create ad hocwireless networks to connect to other cells in order to reduce theoverall amount of traffic being passed through a router and/or over thenetwork 120. In some embodiments, when a large number of cells areconnected to the system, a “super primary” cell can be designated whichcoordinates operation of a number of primary cells and/or handles thetraffic over the network 120. In many embodiments, the super primarycell can disseminate information via its own ad hoc network to variousprimary cells, which then in turn disseminate relevant information tosecondary cells. The network over which a primary cell communicates witha secondary cell can be the same and/or a different ad hoc network asthe one established by a super primary cell. An example system utilizinga super primary cell 116 in accordance with an embodiment of theinvention is illustrated in FIG. 1B. The super primary cell communicateswith primary cells 117 which in turn govern their respective secondarycells 118. Note that super primary cells can govern their own secondarycells. However, in some embodiments, cells may be located too far apartto establish an ad hoc network, but may be able to connect to existingnetwork 120 via alternate means. In this situation, primary cells and/orsuper primary cells may communicate directly via the network 120. Itshould be appreciated that a super primary cell can act as a primarycell with respect to a particular subset of cells within a spatial audiosystem.

Referring again to FIG. 1A, the network 120 can be any form of network,as noted above, including, but not limited to, the internet, a localarea network, a wide area network, and/or any other type of network asappropriate to the requirements of specific applications in accordancewith various embodiments of the invention. Furthermore, the network canbe made of more than one network type utilizing wired connections,wireless connections, or a combination thereof. Similarly, the ad hocnetwork established by the cells can be any type of wired and/orwireless network, or any combination thereof. Communication betweencells can be established using any number of wireless communicationmethodologies including, but not limited to, wireless local areanetworking technologies (WLAN), e.g. WiFi, Ethernet, Bluetooth, LTE, 5GA/R, and/or any other wireless communication technology as appropriateto the requirements of specific applications in accordance with variousembodiments of the invention.

The set of cells can obtain media data from media servers 130 via thenetwork. In numerous embodiments, the media servers are controlled by3^(rd) parties that provide media streaming services such as, but notlimited to: Netflix, Inc. of Los Gatos, Calif.; Spotify Technology S.A.of Stockholm, Sweden; Apple Inc. of Cupertino, Calif.; Hulu, LLC of LosAngeles, Calif.; and/or any other media streaming service provider asappropriate to the requirements of specific applications in accordancewith various embodiments of the invention. In numerous embodiments,cells can obtain media data from local media devices 140, including, butnot limited to, cellphones, televisions, computers, tablets, networkattached storage (NAS) devices and/or any other device capable of mediaoutput. Media can be obtained from media devices via the network, or, innumerous embodiments, be directly obtained by a cell via a directconnection. The direct connection can be a wired connection through aninput/output (I/O) interface, and/or wirelessly using any of a number ofwireless communication technologies.

The illustrated spatial audio system 100 can also (but does notnecessarily need to) include a cell control server 150. In manyembodiments, connections between media servers of various music servicesand cells within a spatial audio system are handled by individual cells.In several embodiments, cell control servers can assist withestablishing connections between cells and media servers. For example,cell control servers may assist with authentication of user accountswith various 3^(rd) party services providers. In a variety ofembodiments, cells can offload processing of certain data to the cellcontrol server. For example, mapping a room based on acoustic rangingmay be sped up by providing the data to a cell control server which canin turn provide back to the cells a map of the room and/or otheracoustic model information including (but not limited to) a virtualspeaker layout. In numerous embodiments, cell control servers are usedto remotely control cells, such as, but not limited to, directing cellsto playback a particular piece of media content, changing volume,changing which cells are currently being utilized to playback aparticular piece of media content, and/or changing the location ofspatial audio objects in the area. However, cell control servers canperform any number of different control tasks that modify cell operationas appropriate to the requirements of specific applications inaccordance with various embodiments of the invention. The manner inwhich different types of user interfaces can be provided for spatialaudio systems in accordance with various embodiments of the inventionare discussed further below.

In many embodiments, the spatial audio system 100 further includes acell control device 160. Cell control devices can be any device capableof directly or indirectly controlling cells, including, but not limitedto, cellphones, televisions, computers, tablets, and/or any othercomputing device as appropriate to the requirements of specificapplications in accordance with various embodiments of the invention. Innumerous embodiments, cell control devices can send commands to a cellcontrol server which in turn sends the commands to the cells. Forexample, a mobile phone can communicate with a cell control server byconnecting to the internet via a cellular network. The cell controlserver can authenticate a software application executing on the mobilephone. In addition, the cell control server can establish a secureconnection to a set of cells which it can pass instructions to from themobile phone. In this way, secure remote control of cells is possible.However, in numerous embodiments, the cell control device can directlyconnect to the cell via either the network, the ad hoc network, or via adirect peer-to-peer connection with a cell in order to provideinstructions. In many embodiments, cell control devices can also operateas media devices. However, it is important to note that a control serveris not a necessary component of a spatial audio system. In numerousembodiments, cells can manage their own control by directly receivingcomments (e.g. through physical input on a cell, or via a networkeddevice) and propagate those commands to other cells.

Further, in numerous embodiments, network connected source input devicescan be included in spatial audio systems to collect and coordinate mediainputs. For example, a source input device may connect to a television,a computer, a media server, or any number of media devices. In numerousembodiments, source input devices have wired connections to these mediadevices to reduce lag. A spatial audio system that includes a sourceinput device in accordance with an embodiment of the invention isillustrated in FIG. 1C. The source input device 170 gathers audio dataand any other relevant metadata from media devices like a computer 180and/or a television 182, and unicasts the audio data and relevantmetadata to a primary in a cluster of cells 190. However, it isimportant to note that source input devices can also act as a primary orsuper primary cell in some configurations. Further, any number ofdifferent devices can connect to source input devices, and they are notrestricted to communicating with only one cluster of cells. In fact,source input devices can connect to any number of different cells asappropriate to the requirements of specific applications of embodimentsof the invention.

While particular spatial audio systems are described above with respectto FIGS. 1A and 1B, any number of different spatial audio systemconfigurations can be utilized including (but not limited to)configurations without connections to third party media servers,configurations that utilize different types of network communications,configurations in which a spatial audio system only utilizes cells andcontrol devices with a local connection (e.g. not connected to theinternet), and/or any other type of configuration as appropriate to therequirements of specific applications in accordance with variousembodiments of the invention. A number of different spatial layouts ofsets of cells are discussed below. As can readily be appreciated, afeature of systems and methods in accordance with various embodiments ofthe invention is that they are not limited to specific spatial layoutsof cells. Accordingly, the specific spatial layouts described below areprovided simply to illustrative the flexible manner in which spatialaudio systems in accordance with many embodiments of the invention canrender a given spatial audio source in a manner appropriate to thespecific number and layout of cells that a user has placed within aspace.

SECTION 2: Cell Spatial Layouts

An advantage of cells over conventional speaker arrangements is theirability to form a spatial audio system that can render spatial audio ina manner that accommodates the specific number and placement of thecells within the space. In many embodiments, cells can locate each otherand/or map their surroundings in order to determine an appropriatemethod for reproducing spatial audio. In some embodiments, cells cangenerate suggested alternative arrangements via user interfaces thatcould improve the perceived quality of rendered sound fields. Forexample, a user interface rendered on a mobile phone could providefeedback regarding placement and/or orientation of cells within aparticular space. As the number of cells increases, in general thespatial resolution capable of reproduction by the cells increases.However, depending on the space, a threshold may be met where anyadditional cell will not, or only slightly increase, the spatialresolution.

Many different layouts are possible, and cells can adapt to any numberof different configurations. A variety of different example layouts arediscussed below. Following the discussion of the different layouts andthe experiences they yield, a discussion of the manner in which soundfields can be created using cells is found below in Section 3.

Turning now to FIG. 2A, a single cell capable of generating directionalaudio using modal beamforming is shown in the center of a room inaccordance with an embodiment of the invention. In many embodiments, asingle cell can be placed in locations including (but not limited to)resting on the floor, resting on a counter, mounted on a stand orsuspended from the ceiling. FIGS. 2B, 2C, and 2D represent a first ordercardioid generated by an array of drivers positioned around the cellusing modal beamforming techniques. While first order cardioids areillustrated, cells in accordance with many embodiments of the inventioncan also generate alternative directivity patterns including (but notlimited to) supercardioids and hypercardioids. A single cell alone iscapable of generating directional audio with the single cell as theorigin similar to an array of conventional speakers that are capable ofperforming modal beamforming and can also control perceived ratios ofdirect and reverberant audio by producing multiple beams in a mannerthat is dependent upon the acoustic environment as illustrated inaccordance with an embodiment of the invention in FIG. 2E. The cell canmap acoustic reflections based on the walls, floor, ceiling and/orobjects in the room, and modify its driver inputs to create diffusesound. Cardioids reflecting the manner in which a cell that includes ahalo having three horns in accordance with an embodiment of theinvention can steer the directivity pattern produced by the cell isillustrated in FIG. 2F. One of a number of higher order directivitypatterns that can also be produced by the cell is illustrated in FIG.2G.

As can readily be appreciated, cells are not limited to any particularconfiguration of drivers and the directivity patterns that can begenerated by a cell are not limited to those described herein. Forexample, while cardioids are shown in the above referenced figures,supercardioids or hypercardioids can be used in addition or as areplacement for cardioids based on horn and/or driver arrangement.Supercardioids have a null near ±120° which can reduce attenuation athorns arranged at ±120° as can be found in many halos. Similarly,hypercardioids also have a null at ±120° that can provide even betterdirectivity at the cost of a larger side lobe at 180°. As can be readilyappreciated, different ambisonics, including mixed ambisonics, can beused depending on horn and/or driver arrangement as appropriate to therequirements of specific applications of embodiments of the invention.In addition, drivers can produce directional audio using any of avariety of directional audio production techniques.

By adding a second cell, the two cells can begin to interact andcoordinate sound production in order to produce spatial audio withincreased spatial resolution. The placement of the cells in a room canimpact how the cells configure themselves to produce sound. An exampleof two cells placed diagonally in a room in accordance with anembodiment of the invention is illustrated in FIG. 3A. As shown in FIG.3B, the cells can project sound at each other. While only one cardioidwave pattern is shown per cell, cells can produce multiple beams and/ordirectivity patterns to manipulate the sound field across the entireroom. An alternative arrangement with two cells against a shared wall inaccordance with an embodiment of the invention is illustrated in FIG. 4Aand FIG. 4B. In this configuration, there may be issues with volumebalance on the opposite facing wall most distant from the cells due tothe imbalanced placement. However, cells can diminish the impact of thisarrangement by appropriately modifying the sound produced by thedrivers.

Cells need not necessarily be placed in corners of rooms. FIG. 5A andFIG. 5B illustrate a placement of two cells in accordance with anembodiment of the invention. In many situations, this can be an optimalplacement acoustically. However, depending on the room and the objectswithin it, it may not be practical to place cells in this configuration.Furthermore, while cells have been illustrated with drivers facing in aparticular direction, depending on the room, the cells can be rotated toa more appropriate orientation for the space. In numerous embodiments,the spatial audio system and/or specific cells can utilize their userinterfaces to suggest that a particular cell be rotated to provideplacement that is more appropriate to the space and/or positioningrelative to other cells.

In numerous embodiments, once three cells have been networked in thesame space, complete control and reproduction of spatial sound objectscan be achieved in at least the horizontal plane. In variousembodiments, depending on the room, an equilateral triangulararrangement can be utilized. However, cells are able to adapt and adjustto maintain control over the sound field in alternative arrangements. Athree-cell arrangement, where each cell is capable of producingdirectional audio using modal beamforming, in accordance with anembodiment of the invention is illustrated in FIGS. 6A and 6B. By addingan overhead cell, additional 3D spatial control can be gained over thesound field. FIGS. 7A and 7B illustrate a three cell grouping with anadditional central overhead cell suspended from the ceiling inaccordance with an embodiment of the invention.

Cells can be “grouped” to operate in tandem to spatially playback apiece of media. Often, groups include all of the cells in a room.However, particularly in very large spaces, groups do not necessarilyinclude all cells in the room. Groups can be further aggregated into“zones.” Zones can further include single cells that have not beengrouped (or alternatively can be considered in their own group with acardinality of one). In some embodiments, each group in a zone may beplaying back the same piece of media, but may be spatially locating theobjects differently. An example home layout of cells in accordance withan embodiment of the invention is illustrated in FIG. 8A. Example groupsin accordance with an embodiment of the invention are illustrated inFIG. 8B, and example zones are illustrated in FIG. 8C. Groupings andzones can be adjusted in real time by users, and cells can dynamicallyreadapt to their groupings. As can be readily appreciated, cells can beplaced in any arbitrary configuration within a physical space.Non-exhaustive examples of alternative arrangements are shown inaccordance with an embodiment of the invention in FIG. 8D. Similarly,cells can be grouped in any arbitrary arrangement as desired by a user.In addition, some cells utilized in many spatial audio systems areincapable of generating directional audio, but may still be incorporatedinto spatial audio systems. Processes for enabling cells to performspatial audio rendering in a synchronized and controllable mannerregardless of their positioning are discussed below.

SECTION 3: Spatial Audio Rendering

Spatial audio has traditionally been rendered with a static array ofspeakers located in prescribed locations. While, up to a point, morespeakers in the array is conventionally thought of as “better,” consumergrade systems have currently settled on 5.1 and 7.1 channel systems,which use 5 speakers, and 7 speakers, respectively in combination withone or more subwoofers. Currently, some media is supported in up to 22.2(e.g. in Ultra HD Television as defined by the InternationalTelecommunication Union). In order to play higher channel sound on fewerspeakers, audio inputs are generally either downmixed to match thenumber of speakers present, or channels that do not match the speakerarrangement are merely dropped. An advantage to systems and methodsdescribed herein is the ability to create any number of audio objectsbased upon the number of channels used to encode the audio source. Forexample, an arrangement of three cells could generate the auditorysensation of the presence of a 5.1 speaker arrangement by placing fiveaudio objects in the room, encoding the five audio objects into aspatial representation (e.g. an ambisonic representation such as (butnot limited to) B-format), and then rendering a sound field using thethree cells by decoding the spatial representation of the original 5.1audio source in a manner appropriate to the number and placement ofcells (see discussion below). In many embodiments, the bass channel canbe mixed into the driver signals for each of the cells. Processes thattreat channels as spatial audio objects are extensible to any arbitrarynumber of speakers and/or speaker arrangements. In this way, fewerphysical speakers in the room can be utilized to achieve the effects ofa higher number of speakers. Furthermore, cells need not be placedprecisely in order to achieve this effect.

Conventional audio systems typically have what is often referred to as a“sweet spot” at which the listener should be situated. In numerousembodiments, the spatial audio system can use information regarding roomacoustics to control the perceived ratio between direct and reverberantsound in a given space such that it sounds like a listener is surroundedby sound, regardless of where they are located within the space. Whilemost rooms are very non-diffuse, spatial rendering methods can involvemapping a room and determining an appropriate sound field manipulationfor rendering diffuse audio (see discussion below). Diffuse sound fieldsare typically characterized by sound arriving randomly from evenlydistributed directions at evenly distributed delays.

In many embodiments, the spatial audio system maps a room. Cells can useany of a variety of methods for mapping a room, including, but notlimited to, acoustic ranging, applying machine vision processes, and/orany other ranging method that enables 3D space mapping. Other devicescan be utilized to create or augment these maps, such as smart phones ortablet PCs. The mapping can include: the location of cells in the space;wall, floor, and/or ceiling placements; furniture locations; and/or thelocation of any other objects in a space. In several embodiments, thesemaps can be used to generate speaker placement and/or orientationrecommendations that can be tailored to the particular location. In someembodiments, these maps can be continuously updated with the location oflisteners traversing the space and/or a history of the location(s) oflisteners. As is discussed further below, many embodiments of theinvention utilize virtual speaker layouts to render spatial audio. Inseveral embodiments, information including (but not limited to) any ofcell placement and/or orientation information, room acousticinformation, user/object tracking information can be utilized todetermine an origin location at which to encode a spatial representation(e.g. an ambisonic representation) of an audio source and a virtualspeaker layout to use in the generation of driver inputs at individualcells. Various systems and methods for rending of spatial audio usingspatial audio systems in accordance with certain embodiments of theinvention are discussed further below.

In a number of embodiments, upmixing can be utilized to create a numberof audio objects that differs from the number of channels. In severalembodiments, a stereo source containing two channels can be upmixed tocreate a number of left (L), center (C), and right (R) channels. In anumber of embodiments, diffuse audio channels can also be generated viaupmixing. Audio objects corresponding to the upmixed channels can thenbe placed relative to a space defined by a number of cells to createvarious effects including (but not limited to) the sensation of stereoeverywhere within the space as conceptually illustrated in FIG. 45 . Incertain embodiments, upmixing can be utilized to place audio objectsrelative to a virtual stage as conceptually illustrated in FIG. 46 . Ina number of embodiments, audio objects can be placed in 3D asconceptually illustrated in FIG. 47 . While specific examples of theplacement objects are discussed with reference to FIGS. 45-47 , any of avariety of audio objects (including audio objects obtained directly thespatial audio system that are not obtained via upmixing) can be placedin any of a variety of arbitrary 1D, 2D, and/or 3D configurations forthe purposes of rendering spatial audio as appropriate to requirementsof specific applications in accordance with various embodiments of theinvention. The rendering of spatial audio from a variety of differentaudio sources is discussed further below. Furthermore, any of the audioobject 2D or 3D layouts described above with reference to FIGS. 45-47can be utilized in any of the processes for selecting and processingsources of audio within a spatial audio system described herein inaccordance with various embodiments of the invention.

In many embodiments, spatial audio systems include source managers thatcan select between one or more sources of audio for rending. FIG. 9illustrates a spatial audio system 900 that includes a source manager906 configured in accordance with various aspects of the method andapparatus for spatial multimedia source management disclosed herein. Asnoted above, the spatial audio system 900 may be implemented using acell and/or using multiple cells. The source manager 906 can receive amultimedia input 902 that includes a variety of data and informationused by the source manager 906 to generate and manage content 908 andrendering information 910. The content 908 can include encoded audiothat is to be spatially rendered, selected from the multimedia sourcesin the multimedia input 902. The rendering information 910 can providecontext for the reproduction of the content 908 in terms of how thesound should be presented, both spatially (telemetry) and volume(level), as further described herein. In many embodiments, the sourcemanager is implemented within a cell in the spatial audio system. Inseveral embodiments, the source manager is implemented on a serversystem that communicates with one or more of the cells within thespatial audio system. In a number of embodiments, the spatial audiosystem includes network connected source input devices that enable theconnection of sources (e.g. wall mounted televisions) to the networkconnected source input device in a location distant from the closestcell. In several embodiments, the network connected source input deviceimplements a source manager that can direct selected sources forrendering on cells within the spatial audio system 900.

A user may directly control the spatial audio system 900 through a userinteraction input 904. The user interaction input 904 may includecommands received from the user through a user interface, including agraphical user interface (GUI) on an app on a “smart device,” such as asmartphone; voice input, such as through commands issued to a “virtualassistant,” such as Apple Inc.'s Siri, Amazon.com Inc.'s Alexa, orGoogle Assistant from Google LLC (Google); and “traditional” physicalinterfaces such as buttons, dials, and knobs. The user interface may becoupled to the source manager 906 and, in general, the spatial audiosystem 900, directly or through a wireless interface, such as throughBluetooth or Wi-Fi wireless standards as promulgated by the IEEE in IEEE802.15.1 and IEEE 802.11 standards, respectively. One or more of thecells utilized within the spatial audio system 900 can also include oneor more of a touch (e.g. buttons and/or capacitive touch) or voice baseduser interaction input 904.

The source manager 906 can provide the content 908 and the renderinginformation 910 to a multimedia rendering engine 912. The multimediarendering engine 912 can generate audio signals and spatial layoutmetadata 914 to a set of cells 916-1 to 916-n based on the content 908and the rendering information 910. In many embodiments, the audiosignals are audio signals with respect to specific audio objects. Inseveral embodiments, the audio signals are virtual speaker audio inputs.The specific spatial layout metadata 914 provided to the cells typicallydepends upon the nature of the audio signals (e.g. locations of audioobjects and/or locations of virtual speakers). Thus, using the set ofcells 916-1 to 916-n, the multimedia rendering engine 912 may reproducethe content 908, which may include multiple sound objects, distributedin a room based on the rendering information 910. Various approaches forperforming spatial audio rendering using cells in accordance withvarious embodiments of the invention are discussed further below.

In several embodiments, the audio signals and (optionally) spatiallayout metadata 914 provided by the multimedia rendering engine 912 tothe cells 916-1 to 916-n may include a separate data stream generatedspecifically for each cell. The cells can generate driver inputs usingthe audio signals and (optionally) the spatial layout metadata 914. In anumber of embodiments, the multimedia rendering engine 912 can producemultiple audio signals for each individual cell, where each audio signalcorresponds to a different direction. When a cell receives the multipleaudio signals, the cell can utilize the multiple audio signals togenerate driver inputs for a set of drivers corresponding to each of theplurality of directions. For example, a cell that includes three sets ofdrivers oriented in three different directions can receive three audiosignals that the cell can utilize to generate driver inputs for each ofthe three sets of drivers. As can readily be appreciated, the number ofaudio signals can depend upon the number of sets of drivers and/or uponother factors appropriate to the requirements of specific applicationsin accordance with various embodiments of the invention. Furthermore,the rendering engine 912 can produce audio signals specific to each celland also provide the same bass signal to all cells.

As noted above, each cell may include one or more sets of differenttypes of audio transducers. For example, each of the cells may beimplemented using a set of drivers that includes one or more bass,mid-range, and tweeter drivers. A filter, such as (but not limited to) acrossover filter, may be used so that an audio signal can be dividedinto a low-pass signal that can be used in the generation of driverinputs to one or more woofers, a bandpass signal that can be used in thegeneration of driver inputs to one or more mids, and a high-pass signalthat can be used in the generation of driver inputs to one or moretweeters. As can readily be appreciated, the audio frequency bandsutilized to generate driver inputs to different classes of drivers canoverlap as appropriate to the requirements of specific applications.Furthermore, any number of drivers and/or orientations of drivers can beutilized to implement a cell as appropriate to the requirements ofspecific applications in accordance with various embodiments of theinvention.

As is discussed further below, spatial audio systems in accordance withmany embodiments of the invention can utilize a variety of processes forspatially rendering one or more audio sources. The specific processestypically depend upon the nature of the audio sources, the number ofcells, the layout of the cells, and the specific spatial audiorepresentation and nested architecture utilized by the spatial audiosystem. FIG. 10 illustrates one process 1000 for rendering sound fieldsthat may be implemented by a spatial audio system in accordance with anembodiment of the invention. At 1002, the spatial audio system receivesa plurality of multimedia source inputs. One or more content sources maybe selected and preprocessed by a source selection software processexecuting on a processor, and the data and information associatedtherewith can be provided to an enumeration determination softwareprocess.

At 1004, a number of sources that are selected for rendering isdetermined by an enumeration determination software process. Theenumeration information can be provided to a position managementsoftware process that allows for the tracking of the number of contentsources.

At 1006, position information for each content source to be spatiallyrendered can be determined by the position management software process.As discussed above, various factors, including (but not limited to) thetype of content being played, positional information of the user or anassociated device, and/or historical/predicted position information, maybe used to determine position information relevant to subsequentsoftware processes utilized to spatially render the content sources.

At 1008, interactions between the enumerated content sources at variouspositions can be determined by an interaction management softwareprocess. The various interactions may be determined based on variousfactors such as (but not limited to) those discussed above, including(but not limited to) type of content, position of playback and/orpositional information of the user or an associated device, andhistorical/predicted interaction information.

At 1010, information including (but not limited to) content andrendering information can be generated and provided to the multimediarendering engine.

In one aspect of the disclosure, the position of playback associatedwith each content source determined at 1006 can occur before interactionbetween the content sources is determined at 1008. This can allow for amore complete management of rendering of spatial audio sources. Thus,for example, if multiple content sources are being played in closeproximity, interaction/mixing may be determined based on awareness ofthat positional proximity. Moreover, a priority level for each contentsource may also be considered.

In accordance with various aspects of the disclosure, informationreceived in the preset/history information may be used by the sourcemanager to affect the content and the rendering information that isprovided to the multimedia rendering engine. The information may includeuser-defined presets and history of how various multimedia sources havebeen handled before. For example, a user may define a preset that allcontent received over a particular HDMI input is reproduced in aparticular location, such as the living room. As another example,historical data may indicate that the user always plays time alarms inthe bedroom. In general, historical information may be used toheuristically determine how multimedia sources may be rendered.

Although specific spatial audio systems that include source managers andmultimedia rendering engines and processes for implementing sourcemanagers and multimedia rendering engines are described above withreference to FIGS. 9 and 10 , spatial audio systems can utilize any of avariety of hardware and/or software processes to select audio sourcesand render sound fields using a set of cells as appropriate to therequirements of specific applications in accordance with variousembodiments of the invention. Processes for rendering sound fields byencoding representations of spatial audio sources and decoding therepresentations based upon a specific cell configuration in accordancewith various embodiments of the invention are discussed further below.

SECTION 4A: Nested Architectures

Spatial audio systems in accordance with many embodiments of theinvention utilize a nested architecture that can have particularadvantages in that it enables spatial audio rendering in a manner thatcan adapt to the number and configuration of the cells and/orloudspeakers being used to render the spatial audio. In addition, thenested architecture can distribute the processing associated withrendering of spatial audio across a number of computing devices withinthe spatial audio system. The specific manner in which a nestedarchitecture of encoders and decoders within a spatial audio system isimplemented is largely dependent upon the requirements of a givenapplication. Furthermore, individual encoder and/or decoder functionscan be distributed across cells. For example, a primary cell canpartially perform the function of a cell decoder to decode audio streamsspecific to a cell. The primary cell can then provide these audiostreams to the relevant secondary cell. The secondary cell can thencomplete the cell decoding process by converting the audio streams todriver signals. As can readily be appreciated, spatial audio systems inaccordance with various embodiments of the invention can utilize any ofa variety of nested architectures as appropriate to the requirements ofspecific applications.

In several embodiments, a primary cell within a spatial audio systemspatially encodes separate audio signals for each audio object beingrendered. As discussed above, the audio objects can be directly providedto the spatial audio system, obtained by mapping channels of the sourceaudio to corresponding audio objects and/or obtained by upmixing andmapping channels of the source audio to corresponding audio objects asappropriate to the requirements of a specific application The primarycell can then decode the spatial audio signals for each audio objectbased upon the locations of the cells being used to render the spatialaudio. A given cell can use its specific audio signals to encode aspatial audio signal for that cell, which can then be decoded togenerate signals for each of the cell's drivers.

When each audio object is separately spatially encoded, the amount ofdata transmitted by a primary cell within the network increases with thenumber of spatial objects. Another approach in which the amount of datatransmitted by a primary cell is independent of the number of audioobjects is for the primary cell to spatially encode all audio objectsinto a single spatial representation. The primary cell can then decodethe spatial representation of all of the audio objects with respect to aset of virtual speakers. The number and locations of the virtualspeakers is typically determined based upon the number and locations ofthe cells used to render the spatial audio. In many embodiments,however, the number of virtual speakers can be fixed irrespective of thenumber of cells, but have locations that are dependent upon the numberand locations of cells. For example, a spatial audio system can utilizeeight virtual speakers located around the circumference of a circle incertain use cases (irrespective of the number of cells). As can readilybe appreciated, the number of virtual speakers can depend upon thenumber of grouped cells and/or the number of channels in the source.Furthermore, the number of virtual speakers can be greater than or lessthan eight. The primary cell can then provide a given cell with a set ofaudio signals decoded based upon the locations of the virtual speakersassociates with that cell. The virtual speaker inputs can be convertedinto a set of driver inputs by treating the virtual speakers as audioobjects and performing a spatial encoding based upon the cell's positionrelative to the virtual speaker locations. The cell can then decode thespatial representation of the virtual speakers to generate driverinputs. In many embodiments, the cells can efficiently convert receivedvirtual speaker inputs into a set of driver inputs using a set offilters. In several embodiments, the primary can commence the decodingof the virtual speaker inputs into a set of audio signals for each cell,where each audio signal corresponds to a specific direction. When theset of audio signals is provided to a secondary cell, the secondary cellcan utilize each audio signal to generate driver inputs for a set ofdrivers oriented to project sound in a particular direction.

In several embodiments, the spatial encodings performed within a nestedarchitecture involve encoding the spatial objects into ambisonicrepresentations. In many embodiments, the spatial encodings performedwithin the nested architecture utilize higher order ambisonics (e.g.sound field representation), a Vector Based Amplitude Panning (VBAP)representation, a Distance Based Amplitude Panning (DBAP), and/or ak-Nearest-Neighbors panning (KNN panning) representation. As can readilybe appreciated, the spatial audio system may support multiple spatialencodings and can select between a number of different spatial audioencoding techniques based upon factors including (but not limited to):the nature of the audio source, the layout of a particular group ofcells, and/or user interactions with the spatial audio system (e.g.spatial audio object placement and/or spatial encoding controlinstructions). As can readily be appreciated, any of a variety ofspatial audio encoding techniques can be utilized within a nestedarchitecture as appropriate to the requirements of specific applicationsin accordance with various embodiments of the invention. Furthermore,the specific manner in which spatial representations of audio objectsare decoded to provide audio signals to individual cells can depend uponfactors including (but not limited to) the number of audio objects, thenumber of virtual speakers (where the nested architecture utilizesvirtual speakers) and/or the number of cells.

FIG. 11 conceptually illustrates a process 1100 for spatial audiocontrol and reproduction that involves creating an ambisonic encoding ofan audio source by treating different channels as spatial sound objects.The audio objects can then be placed in distinct locations and thelocations of the audio objects used to generate an ambisonicrepresentation of a sound field at a selected origin location. WhileFIG. 11 is described in the context of a spatial audio system that usesambisonic representations of spatial audio, processes similar to thoseillustrated in FIG. 11 can be implemented using any of a variety ofspatial audio representations including (but not limited to) higherorder ambisonics (e.g. sound field representation), a VBAPrepresentation, a DBAP representation, and/or a KNN panningrepresentation.

The process 1100 can be implemented by a spatial audio system and caninvolve a system encoder 1112 that provides conversion of audiorendering information into an intermediate format. In many embodiments,the conversion process can involve demultiplexing encoded audio datathat encodes one or more audio tracks and/or audio channels from acontainer file or portion of a container file. The audio data can thenbe decoded to create a plurality of separate audio inputs that can eachbe treated as a separate sound object. In one aspect, the system encoder1112 can encode sound objects and their associated information (e.g.,position) for a particular environment. Examples can include (but arenot limited to) a desired speaker layout for a channel-based audiosurround sound system, a band position template, and/or an orchestratemplate for a set of instruments.

The system encoder 1112 may position, or map, sound objects and operatein a fashion such as a panner. The system encoder 1112 can receiveinformation about sound objects in sound information 1102 and renders,in a generalized form, these sound objects. The system encoder 1112 canbe agnostic to any implementation details (e.g. number of cells, and/orplacement and orientation of cells), which are handled downstream bydecoders, as further described herein. In addition, the system encoder1112 may receive sound information in a variety of content and formats,including (but not limited to) channel-based sound information, discretesound objects, and/or sound fields.

FIG. 12A illustrates a conceptual representation of a physical space1200 with an example mapping of sound objects by the system encoder 1112that may be used to describe various aspects of the operation of thesystem encoder 1112. In one aspect of the disclosure, the system encoder1112 performs the mapping of sound objects using a coordinate system inwhich positional information is defined relative to an origin. Theorigin and coordinate system may be arbitrary and can be established bythe system encoder 1112. In the example as shown in FIG. 12A, the systemencoder 1112 establishes an origin 1202 at location [0,0] for aCartesian coordinate system in the conceptual representation, with thefour corners of the coordinate system being [−1,−1], [−1,1], [1,−1], and[1,1]. The sound information provided to the system encoder 1112includes a sound object S 1212 that the system encoder 1112 maps tolocation [0,1] in the conceptual representation. It should be noted thatalthough the example provided in FIG. 12A is expressed in terms of theCartesian coordinate system in two dimensions, other coordinate systemsand dimensions may be used, including polar, cylindrical, and sphericalcoordinate systems. A particular choice of the coordinate system used inthe examples herein should not be considered limiting.

In some cases, the system encoder 1112 may apply a static transform ofthe coordinate system of the system encoder 1112 to adapt to an initialorientation of external playback or control devices including, but notlimited to, head mounted displays, mobile phone, tablets, or gamingcontrollers. In other cases, the system encoder 1112 may receive aconstant stream of telemetry data associated with a user, such as, forexample, from a 6 degree of freedom (6DOF) system, and continuallyreposition sound objects in order to maintain a particular renderingusing this stream of telemetry data.

The system encoder 1112 can generate, as output, an ambisonic encodingof the spatial audio objects in an intermediary format (e.g. B-format)1122. As noted above, other formats can be utilized to represent spatialaudio information as appropriate to the requirements of specificapplications including (but not limited to) formats capable ofrepresenting second and/or higher order ambisonics. In FIG. 11 , thesound field information is shown as sound field information 1122, whichcan include mapping information about sound objects such as the soundobject S 1212.

Referring again to FIG. 11 , the system 1100 includes a system decoder1132 that may be used to receive ambisonic encodings 1122 of the spatialaudio objects from the system encoder 1112 and provide system-levelambisonic decoding for each of the cells in the spatial audio system1100. In one aspect of the disclosure, the system decoder 1132 is awareof the cells and their physical layouts and allows the system 1100 toprocess the sound information 1102 appropriately to reproduce audio withthe particular speaker arrangement and environment (e.g., room).

FIG. 12B illustrates a conceptual representation of the physical spacecorresponding to the conceptual representation of FIG. 12A that includesan overlay of a layout of a group of cells. The group of cells includesthree (3) cells: cell 1 1270_SN1, cell 2 1270_SN2, and cell 3 1270_SN3.The system decoder 1132 adapts the mapping performed by the systemencoder 1112 with actual physical measurements to arrive at theconceptual representation shown in FIG. 12B. Thus, in the conceptualrepresentation shown in FIG. 12B, the corners of the conceptualrepresentation shown in FIG. 12A have been translated to locations[−X,−Y], [−X,Y], [X,−Y], and [X,Y], where X and Y represent physicaldimensions of the physical space. For example, if the physical space isdefined to be a 20 meter by 14 meter room, then X may be 20 and Y may be20. The sound object S 1212 is mapped to location [0,y_S]. While notshown in FIG. 12B, the spatial locations of the cells are determined inthree dimensions in spatial audio systems in accordance with manyembodiments of the invention.

The system decoder 1132 can generate an output data stream for each cellencoder that can include (but is not limited to) audio signals for eachof the sound objects and spatial location metadata. In severalembodiments, the spatial location metadata describes the spatialrelationship between the cell and the locations of the audio objectsutilized by the system decoder 1132 in the ambisonic decoding of theambisonic representation of the spatial audio objects generated by thesystem encoder 1112. As shown in FIG. 11 , where there are n-cells, thesystem decoder 1132 may provide n distinct data streams as separateoutputs 1142 to each of the n cells, where each data stream includessound information for a specific cell. Furthermore, each of the datastreams for each of the n cells can include multiple audio streams. Asdiscussed above, each audio stream may correspond to a directionrelative to the cell.

In addition to the system encoder 1112, the system 1100 also includesencoder functionality at the cell-level. In accordance with variousaspects of the disclosure, the system 1100 can include a second encoderassociated with each cell, illustrated as cell encoders 1152-1 to 1152-nin FIG. 11 . In one aspect, each of the cell encoders 1152-1 to 1152-nis responsible for generating sound field information at a cell-levelfor its associated cell from the sound information received from thesystem decoder 1132. Specifically, each of the cell encoders 1152-1 to1152-n can receive sound information from the output 1142 from thesystem decoder 1132.

Each of the cell encoders 1152-1 to 1152-n may provide a cell-levelsound field representation output to a respective cell decoder thatincludes directivity and steering information. In one aspect of thedisclosure, the cell-level sound field representation output from eachcell encoder is a sound field representation relative to its respectivecell and not the origin of the system. A given cell encoder can utilizeinformation concerning the locations of each sound object, and/orvirtual speaker and the cell relative to the system origin and/orrelative to each other to encode the cell-level sound fieldrepresentation. From this information, each of the cell encoders 1152-1to 1152-n may determine a distance and an angle from its associated cellto each sound object, such as the sound object S 1212.

Referring to FIG. 12C, for example, where there are three cells (n=3), afirst cell encoder 1152_SN1 for cell 1 1270_SN1 may use the soundinformation in the n-channel output 1142 to determine that the soundobject S 1212 is at a distance d_SN1 at an angle theta_SN1 with respectto cell 1 1270_SN1. Similarly, a second cell encoder 1152_SN2 and athird cell encoder 1152_SN3 that are associated with cell 2 1270_SN2,and cell 3 1270_SN3, respectively, may use the sound information in then-channel output 1142 to determine distances and angles from each ofthese cells and the sound object S 1212. In one aspect of thedisclosure, each cell encoder may only receive its associated channelfrom the n-channel output 1142. In many embodiments, a similar processis performed during cell encoding based upon the locations of virtualspeakers relative to a cell.

The cell-level sound field representation outputs from all of the cellencoders 1152-1 to 1152-n are collectively illustrated in FIG. 11 ascell-level sound field representation information 1162.

Based on the cell-level sound field representation output 1162 receivedfrom the cell encoder 1152-1 to 1152-n which can be located in each ofthe n cells or on a single primary cell, a local cell decoder 1172-1 to1172-n can render audio to drivers contained in the cell, collectivelyillustrated as transducer information 1182. Continuing with the exampleabove, groups of drivers 1192-1 to 1192-n are also associated withrespective cell decoders 1172-1 to 1172-n, where one group of drivers isassociated with each cell and, more specifically, each cell decoder. Itshould be noted that the orientation and number of drivers in a group ofdrivers for a cell are provided as examples and the cell decodercontained therein may adapt to any specific orientation or number ofloudspeakers. Furthermore, a cell can have a single driver and differentcells within a spatial audio system can have different sets of drivers.

In one aspect of the disclosure, each cell decoder provides transducerinformation based on physical driver geometry of each respective cell.As further described herein, the transducer information may be convertedto generate electrical signals that are specific to each driver in thecell. For example, a first cell decoder for cell 1 1270_SN1 may providetransducer information for each of the drivers in the cell 1294_S1,1294_S2, and 1294_S3. Similarly, a second cell decoder 1172_SN2 and athird cell decoder 1172_SN3 may provide transducer information for eachof the drivers in cell 2 1270_SN2 and cell 3 1270_SN3, respectively.

Referring to FIG. 12D in addition to FIG. 12C, if cell 1 1270_SN1 is torender the sound object S 1212 at the angle theta_SN1 and the distanced_SN1, where cell 1 1270_SN1 includes three drivers illustrated as afirst driver 1294_S1, a second driver 1294_S2, and a third driver1294_S3, the first cell decoder 1172_SN1 may provide transducerinformation to each of these three drivers. As can readily beappreciated, the specific signals generated by a cell decoder arelargely dependent upon the configuration of the cell.

While specific processes for rendering sound fields from arbitrary audiosources using ambisonics, any of a variety of audio signal processingpipelines can be utilized to render sound fields using multiple cells ina manner that is independent from a number of channels and/orspeaker-layout assumption utilized in the original encoding of an audiosource as appropriate to the requirements of specific applications inaccordance with various embodiments of the invention. For example,nested architectures can be utilized that employ other spatial audiorepresentations in combination with or as an alternative to ambisonicrepresentations including (but not limited to) higher order ambisonics(e.g. sound field representation), VBAP representations, DBAP, and/orKNN panning representations. Specific processes for rendering soundfields that utilize spatial audio reproduction techniques to generateaudio inputs for a set of virtual speakers that are then utilized byindividual cells to generate driver inputs in accordance with variousembodiments of the invention are discussed further below.

SECTION 4B: Nested Architectures that Utilize Virtual Speakers

Spatial audio reproduction techniques in accordance with variousembodiments of the invention can be used to render an arbitrary piece ofsource audio content on any arbitrary arrangement of cells, regardlessof the number of channels of the source audio content. For example,source audio encoded in a 5.1 surround sound format is normally renderedusing 5 speakers and a dedicated subwoofer. However, systems and methodsdescribed herein can render the same content in the same quality using asmaller number of cells. Turning now to FIGS. 13A-D, a visualrepresentation of ambisonic rendering techniques utilized to map 5.1channel audio to three cells in accordance with an embodiment of theinvention is illustrated. As can be readily appreciated, the exampleshown in FIGS. 13A-D is generalizable to any arbitrary number of inputchannels to any arbitrary number of cells. Furthermore, channel basedaudio can be upmixed and/or downmixed to create a number of spatialaudio objects that is different to the number of channels used in theencoding of audio. In addition, the processes described herein are notlimited to the use of ambisonic representations of spatial audio.

FIG. 13A illustrates a desired 5.1 channel speaker configuration. The5.1 format has three forward speakers and two rear speakers, where theforward and rear speakers fire toward each other. The 5.1 channelspeaker configuration is set up so that a point at the center of theconfiguration is the focus of the surround sound. Using thisinformation, a ring of virtual speakers can be established with the samefocus. This ring of virtual speakers in accordance with an embodiment ofthe invention is illustrated in FIG. 13B. In this example, eight virtualspeakers are instantiated, although the number can be higher or lowerdepending on the number of cells used and/or the degree of spatialseparation desired. In many embodiments, the ring of virtual speakersemulates an ambisonic loudspeaker array. Ambisonic encoding can be usedto map the 5.1 channel audio to the ring of virtual loudspeakers bycalculating the ambisonic representation required to create the samesound field that would match the sound field generated by the 5.1channel speaker system. Using the ambisonic representation, each virtualspeaker can be assigned an audio signal, which, if rendered, wouldcreate said sound field. Alternative spatial audio rendering techniquescan be utilized to encode the 5.1 channel audio to any of a variety ofspatial audio representations, which is then decoded based upon an arrayof virtual speakers, using a representation such as (but not limited to)higher order ambisonics (e.g. sound field representation), a VBAPrepresentation, DBAP representation, and/or a KNN panningrepresentation.

Due to the modal beamforming capabilities of the cells utilized in manyembodiments of the invention, which enable them to render sound objects,the virtual speakers can be assigned to cells in a group as soundobjects. The cells each can encode the audio signals associated with thevirtual speakers that they are assigned into a spatial audiorepresentation, which the cell can then decode to obtain a set ofsignals to drive the drivers contained within the cell. In this way, thecells can collectively render the desired sound field. A three cellarrangement rendering the 5.1 channel audio in accordance with anembodiment of the invention is illustrated in FIG. 13C. In someembodiments, an aerial cell (located on a higher horizontal plane thanthe other cells, can be introduced to more closely approximate anambisonic speaker array. An example configuration that includes anaerial cell in accordance with an embodiment of the invention isillustrated in FIG. 13D. While specific examples are described abovewith reference to FIGS. 13A-13D based upon a 5.1 channel source andgroups including 3 or 4 cells, any of a variety of mappings of anynumber of channels (including a single channel) to one or more spatialaudio objects (including by upmixing and/or downmixing of channels) forrendering by an arbitrary configuration of a group of one or more cellscan be performed using processes similar to any of the processesdescribed herein as appropriate to the requirements of specificapplications in accordance with various embodiments of the invention.

FIG. 14 illustrates a sound information process 1400 for processingsound information that may be implemented by a system for spatial audiocontrol and reproduction in accordance with various aspects of thepresent disclosure. At 1410, sound information which, can include soundobjects, is received by a system encoder. At 1420, a map of the celllocations can be obtained. At 1430, the system encoder creates a soundfield representation using sound information for a set of sound objects.In general, the system encoder generates the sound field representationof the sound objects at a system-level. In one aspect of the presentdisclosure, this system-level sound field representation includesposition information of the sound objects in the sound information. Forexample, the system encoder may generate the sound field information bymapping sound objects contained in the sound information. The soundfield information may utilize an ambisonic representation that includescomponents W, which is the omnidirectional component, X and Y, and, ifapplicable, Z. As noted above, alternative spatial audio representationscan be utilized including (but not limited to) higher order ambisonics(e.g. sound field representation), a VBAP representation, a DBAPrepresentation, and/or a KNN panning representation. The positioninformation can be defined with respect to an origin selected by thesystem encoder, which is referred to as the “system origin” because thesystem encoder has determined the origin.

At 1440, a system decoder receives the sound field information, whichincludes the system-level sound field representation generated by thesystem encoder using the sound information. The system decoder, usingthe system-level sound field representation and an awareness of thelayout and number of the cells in the system, may generate a per-celloutput in the form of an n-channel output. As discussed, in one aspectof the disclosure, information in the n-channel output is based on thenumber and layout of cells in the system. In many embodiments, thedecoder utilizes the layout of the cells to define a set of virtualspeakers and generates a set of audio inputs for a set of virtualspeakers. The specific channel output from the n-channel output that isprovided to a given cell can include one or more of the audio inputs forthe set of virtual speakers and information concerning the locations ofthose virtual speakers. In several embodiments, a primary cell utilizesthe virtual speakers to decode a set of audio signals for each of thecells (e.g. the primary cell performs processing to generate cellsignals based on representations of sound information for each virtualspeaker 1460). In a number of embodiments, each audio signal decoded fora particular cell corresponds to a set of drivers oriented in a specificdirection. When a cell has, for example, three sets of drivers orientedin different directions, then the primary cell can decode three audiosignals (one for each set of drivers) from the all or a subset of theaudio signals for the virtual speakers. When the primary cell decodes aset of audio signals for each of the cells, then it is these signalsthat are the n-channel output that is provided to a given cell.

At 1450, each cell encoder receives one of the n-channels of soundinformation for the set of virtual speakers in the n-channel outputgenerated by the system decoder. Each cell encoder can determine soundfield representation information at a cell level from the audio inputsto the virtual speakers and the locations of the virtual speakers, whichcan allow a respective cell decoder to later generate appropriatetransducer information for one or more drivers associated therewith, asfurther discussed herein. Specifically, each cell encoder in a cellpasses its sound field representation information to its associated celldecoder in outputs that can be collectively referred to as thecell-level sound field representation information. The associated celldecoder can then decode the cell-level sound field representationinformation to output 1460 individual driver signals to the drivers. Inone aspect of the disclosure, this cell-level sound field representationinformation is provided as information to attenuate the audio to begenerated from each cell. In other words, the signal is being attenuatedby a certain amount to bias it in a particular direction (e.g.,panning). In many embodiments, the virtual speaker inputs can bedirectly transformed to individual driver signals using a set of filterssuch as (but not limited to) a set of FIR filters. As can readily beappreciated, generation of driver signals using filters is an efficienttechnique for performing the nested encoding and decoding the virtualspeaker inputs in a manner that accounts for a fixed relationshipbetween the virtual speaker locations and the cell locationsirrespective of the locations of the spatial audio objects rendered bythe cells.

In several embodiments, the cell encoder and cell decoder can useambisonics to control the directivity of the signals produced by eachcell. In a number of embodiments, first order ambisonics are utilizedwithin the process for encoding and/or decoding audio signals for aspecific cell based upon the audio inputs of the set of virtualspeakers. In a number of embodiments, a weighted sampling decoder isutilized to generate a set of audio signals for a cell. In severalembodiments, additional side rejection is obtained in beams formed by acell using higher order ambisonics including (but not limited to)supercardoids and/or hypercardiods. In this way, the use of a decoderthat relies upon higher order ambisonics can achieve greater directivityand less crosstalk between sets of drivers (e.g. horns) of the cellsutilized within spatial audio systems in accordance with variousembodiments of the invention. In several embodiments maximum energyvector magnitude weighting can be utilized to implement a higher orderambisonic decoder utilized to decode audio signals for a cell within aspatial audio system. As can readily be appreciated, any of a variety ofspatial audio decoders can be utilized to generate audio signals for acell based upon a number of virtual speaker input signals and theirlocations as appropriate to the requirements of specific applications inaccordance with various embodiments of the invention.

As is discussed further below, the perceived distance and direction ofspatial audio objects can be controlled by modifying the directivityand/or direction of the audio produced by the cells in ways that modifycharacteristics of the sound including (but not limited to) the ratio ofthe power of direct audio to the power of diffuse audio perceived by oneor more listeners located proximate a cell or group of cells. Whilevarious processes for decoding audio signals for specific cells in anested architecture utilizing virtual speakers are described above, celldecoders similar to the cell decoders described herein can be utilizedin any of a variety of spatial audio systems including (but not limitedto) spatial audio systems that to not rely upon the use of virtualspeakers in the encoding of spatial audio and/or rely upon any of avariety of different numbers and/or configurations of virtual speakersin the encoding of spatial audio as appropriate to the requirements ofspecific applications in accordance with various embodiments of theinvention. When multiple network-connected cells exist on a network, itcan be beneficial to reduce the amount of traffic needed to flow overthe network. This can reduce latency which can be critical forsynchronizing audio. As such, in a variety of embodiments, a primarycell can be responsible for encoding the spatial representation anddecoding the spatial representation based upon a virtual speaker layout.The primary cell can then transmit the decoded signals for the virtualspeakers to secondary cells for the remainder of the steps. In thisfashion, the maximum number of audio signals to be transmitted acrossthe network is independent of the number of spatial audio objects andinstead depends upon the number of virtual speaker audio signals thatare desired to be provided to each cell. As can readily be appreciated,the division between primary cell processing and secondary cellprocessing can be drawn at any arbitrary point with various benefits andconsequences.

In many embodiments, drivers in the driver array of a cell may bearranged into one or more sets, which can each be driven by the celldecoder. In numerous embodiments, each driver set contains at least onemid and at least one tweeter. However, different numbers of drivers andclasses of drivers can make up a driver set, including, but not limitedto, all one type of driver as appropriate to the requirements ofspecific applications in accordance with various embodiments of theinvention. For example, FIG. 15 illustrates sets of drivers in a driverarray of a cell in accordance with an embodiment of the invention. Acell decoder 1500 drives a driver array 1510, which includes a first setof mid/high drivers 1512-1, a second set of mid/high drivers 1512-2, anda third set of mid/high drivers 1512-3. Each driver set may include oneor more audio transducers of different types, such as one or more bass,mid-range, and tweeter speakers. In one aspect of the disclosure, aseparate audio signal may be generated for each loudspeaker set in aloudspeaker array, and a bandpass filter such as a crossover may be usedso that the transducer information generated by the cell decoder 1500may be divided into different band-passed signals for each of thedifferent types of driver in a particular driver set. In the illustratedembodiment, each of the mid/high driver sets includes a mid 1513-1 and atweeter 1513-2. In many embodiments, the driver array further includes awoofer driver set 1514. In many embodiments, the woofer driver setincludes two woofers. However, any number of woofers, including nowoofers, one woofer, or n woofers can be utilized as appropriate to therequirements of specific applications in accordance with variousembodiments of the invention.

In a number of embodiments, the perceived quality of the spatial audiorendered by a spatial audio system can be enhanced by using directionalaudio to control the perceived ratio of direct and reverberant sound inthe rendered sound field. In many embodiments, increased reverberantsound is achieved using modal beamforming to direct beams to reflect offwalls and/or other surfaces within a space. In this way, the ratiobetween direct and reverberant noise can be controlled by renderingaudio that includes direct components in a first direction andadditional indirect audio components in additional directions that willreflect off nearby surfaces. Various techniques that can be utilized toachieve immersive spatial audio using directional audio in accordancewith a number of different embodiments of the invention are discussedbelow.

Turning now to FIG. 16 , a process for rendering spatial audio in adiffuse and directed fashion in accordance with an embodiment of theinvention is illustrated. Process 1600 includes obtaining (1610) all ora portion of an audio file and obtaining (1620) a cell location map.Using this information, a direct audio spatial representation is encoded(1630). The direct representation can include the information regardingdirect sound (rather than diffuse sound). The direct representation canbe decoded (1640) using a virtual speaker layout and then the output isencoded (1650) for the true cell layout. This encoded information cancontain spatial audio information that can be used to generate thedirect potion of the sound field associated with the source audio. Insubstantially real time, a distance scaling process can be performed(1660) and a diffuse spatial representation encoded (1670). This diffuserepresentation can be decoded (1680) using the virtual speaker layoutand encoded (1690) for the true cell layout to control the perceivedratio between direct and reverberant sound. The diffuse and directrepresentations can be decoded (1695) by the cells to render the desiredsound field.

As can be appreciated from the discussion above, the ability todetermine spatial information including (but not limited to) therelative position and orientation of cells in a space, and the acousticcharacteristics of a space can greatly assist with the rendering ofspatial audio. In a number of embodiments, ranging processes areutilized to determine various characteristics of the placement andorientation of cells and/or the space in which the cells are placed.This information can then be utilized to determine virtual speakerlocations. Collectively spatial data including (but not limited to)spatial data describing cells, a space, location of a listener, historiclocations of listeners, and/or virtual speaker locations can be referredto as spatial location metadata. Various processes for generatingspatial location metadata and distributing some or all of the spatiallocation metadata to various cells within a spatial audio system inaccordance with various embodiments of the invention are describedbelow.

Turning now to FIG. 17 , a process for propagating virtual speakerplacements to cells in accordance with an embodiment of the invention isillustrated. Process 1700 includes mapping (1710) the space. As notedabove, space mapping can be performed by cells and/or other devicesusing any of a number of techniques. In a variety of embodiments,mapping a space includes determining the acoustic reflectivity ofvarious objects and barriers in the space.

Process 1700 further includes locating (1720) neighboring cells. Innumerous embodiments, cells can be located by other cells using acousticsignaling. Cells can also be identified via visual confirmation using anetwork connected camera (e.g. mobile phone camera). Once cells in aregion have been located, a group can be configured (1730). Based on thelocation of the speakers in the group, a virtual speaker placement canbe generated (1740). The virtual speaker placement can then bepropagated (1750) to other cells. In numerous embodiments, a primarycell generates the virtual speaker placements and propagates theplacements to secondary cells connected to the primary. In manyembodiments, more than one virtual speaker placement can be generated.For example, conventional 2, 2.1, 5.1, 5.1.2, 5.1.4, 7.1, 7.1.2, 7.1.4,9.1.2, 9.1.4, and 11.1 speaker placements including speaker placementsrecommended in conjunction with various audio encoding formats including(but not limited to) Dolby Digital, Dolby Digital Plus, and Dolby Atmos,as developed by Dolby Laboratories, Inc. may be generated as they aremore common. However, virtual speaker placements can be generated on thefly using the map.

As noted above, the components of a nested architecture of spatialencoders and spatial decoders can be implemented within individual cellswithin a spatial audio in a variety of ways. Software of a cell that canbe configured act as a primary cell or a secondary cell within a spatialaudio system in accordance with an embodiment of the invention isconceptually illustrated in FIG. 48 . The cell 4800 includes a series ofdrivers including (but not limited to) hardware drivers, and interfaceconnector drivers such as (but not limited to) USB and HDMI drivers. Thedrivers enable the software of the cell 4800 to capture audio signalsusing one or more microphones and to generate driver signals (e.g. usinga digital to analog converter) for the one or more drivers in the cell.As can readily be appreciated, the specific drivers utilized by a cellare largely dependent upon the hardware of the cell.

In the illustrated embodiment, an audio and midi application D #402 isprovided to manage information passing between various softwareprocesses executing on the processing system of the cell and thehardware drivers. In several embodiments, the audio and midi applicationis capable of decoding audio signals for rendering on the sets ofdrivers of the cell. Any of the processes described herein for decodingaudio for rendering on a cell can be utilized by the audio and midiapplication including the processes discussed in detail below.

A hardware audio source processes 4804 manage communication withexternal sources via the interface connector drivers. The interfaceconnector drivers can enable audio sources to be directly connected tothe cell. Audio signals can be routed between the drivers and varioussoftware processes executing on the processing system of the cell usingan audio server 4806.

As noted above, audio signals captured by microphones can be utilizedfor a variety of applications including (but not limited to)calibration, equalization, ranging, and/or voice command control. In theillustrated embodiment, audio signals from the microphone can be routedfrom the audio and midi application 4802 to a microphone processor 4808using the audio server 4806. The microphone processor can performfunctions associated with the manner in which the cell generates spatialaudio such as (but not limited to) calibration, equalization, and/orranging. In several embodiments, the microphone is utilized to capturevoice commands and the microphone processor can process the microphonesignals and provide them to word detection and/or voice assistantclients 4810. When command words are detected, the voice assistantclients 4810 can provide audio and/or audio commands to cloud servicesfor additional processing. The voice assistant clients 4810 can alsoprovide response from the voice assistant cloud services to theapplication software of the cell (e.g. mapping voice commands tocontrols of the cell). The application software of the cell can thenimplement the voice commands as appropriate to the specific voicecommand.

In several embodiments, the cell receives audio from a network audiosource. In the illustrated embodiment, a network audio source process4812 is provided to manage communication with one or more remote audiosources. The network audio source process can manage authentication,streaming, digital rights management, and/or any other processes thatthe cell is required to perform by a particular network audio source toreceive and playback audio. As is discussed further below, the receivedaudio can be forwarded to other cells using a source server process 4814or provided to a sound server 4816.

The cell can forward a source to another cell using the source server4814. The source can be (but is not limited to) an audio source directlyconnected to the cell via a connector, and/or a source obtained from anetwork audio source via the network audio source process 4812. Sourcescan be forwarded between a primary in a first group of cells and aprimary in second group of cells to synchronize playback of the sourcebetween the two groups of cells. The cell can also receive one or moresources from another cell or a network connected source input device viathe source server 4814.

The sound server 4816 can coordinate audio playback on the cell. Whenthe cell is configured as a primary, the sound server 4816 can alsocoordinate audio playback on secondary cells. When the cell isconfigured as a primary, the source server 4816 can receive an audiosource and process the audio source for rendering using the drivers onthe cell. As can readily be appreciated any of a variety of spatialaudio processing techniques can be utilized to process the audio sourceto obtain spatial audio objects and to render audio using the cell'sdrivers based upon the spatial audio objects. In a number ofembodiments, the cell software implements a nested architecture similarto the various nested architectures described above in which the sourceaudio is used to obtain spatial audio objects. The sound server 4816 cangenerate the appropriate source audio objects for a particular audiosource and then spatially encode the spatial audio objects. In severalembodiments, the audio sources can already be spatially encoded (e.g.encoded in an ambisonic format) and so the sound server 4816 need notperform spatial encoding. The sound server 4816 can decode spatial audioto a virtual speaker layout. The audio signals for the virtual speakerscan then be used by the sound server to decode audio signals specific tothe location of the cell and/or locations of cells within a group. Inseveral embodiments, the process of obtaining audio signals for eachcell involves spatially encoding the audio inputs of the virtualspeakers based upon the location of the cell and/or other cells within agroup of cells. The spatial audio for each cell can then be decoded intoseparate audio signals for each set of drivers included in the cell. Ina number of embodiments, the audio signal for the cell can be providedto the audio and midi application 4802, which generates the individualdriver inputs. Where the cell is primary cell within a group of cells,the sound server 4816 can transmit the audio signals for each of thesecondary cells over the network. In many embodiments, the audio signalsare transmitted via unicast. In several embodiments, some of the audiosignals are unicast and at least one signal is multicast (e.g. a basssignal that is used for rendering by all cells within a group). In anumber of embodiments, the sound server 4816 generates direct anddiffuse audio signals that are utilized by the audio and midiapplication 4802 to generate inputs to the cell's drivers using thehardware drivers. Direct and diffuse signals can also be generated bythe sound server 4816 and provided to secondary cells.

When the cell is a secondary cell, the sound server 4802 can receive anaudio signals that were generated on a primary cell and provided to thecell via a network. The cell can route the received audio signals to theaudio and midi application 4802, which generates the individual driverinputs in the same manner as if the audio signals had been generated bythe cell itself.

Various potential implementations of sound servers can be utilized incells similar to those described above with reference to FIG. 48 and/orin any of a variety of other types of cells that can be utilized withinspatial audio systems in accordance with certain embodiments of theinvention. A sound server software implementation that can be utilizedin a cell within a spatial audio system in accordance with an embodimentof the invention is conceptually illustrated in FIG. 49 . The soundserver 4900 utilizes source graphs 4902 to process particular audiosources for input into appropriate spatial encoders 4904 as appropriateto the requirements of specific applications. In several embodiments,multiple sources can be mixed. In the illustrated embodiment, a mixengine 4906 mixes spatially encoded audio from each of the sources. Themixed spatially encoded audio is provided to at least a local decoder4908, which decodes the spatially encoded audio into audio signalsspecific to the cell that can be utilized to render driver signals forthe sets of drivers within the cell. The mixed spatially encoded audiosignal can be provided to one or more secondary decoders 4910. Eachsecondary decoder is capable of decoding spatially encoded audio intoaudio signals specific to a particular secondary cell based upon on thelocation of the cell and/or the layout of the environment in which thegroup of cells is located. In this way, a primary cell can generateaudio signals for each cell in a group of cells. In the illustratedembodiment, a secondary send process 4912 is utilized to transmit theaudio signals via a network to the secondary cells.

The source graphs 4902 can be configured in a variety of different waysdepending upon the nature of the audio. In several embodiments, the cellcan receive sources that mono, stereo, any of a variety of multichannelsurround sound formats, and/or audio encoded in accordance with anambisonic format. Depending upon the encoding of the audio, the sourcegraph can map an audio signal or an audio channel to an audio object. Asdiscussed above, the received source can be upmixed and/or downmixed tocreate a number of audio objects that is different to the number ofaudio signals/audio channels provided by the audio source. When theaudio is encoded in an ambisonic format, the source graph may be able toforward the audio source directly to the spatial encoder. In severalembodiments, the ambisonic format may be incompatible with the spatialencoder and the audio source must be reencoded in an ambisonic formatthat is an appropriate input for the spatial encoder. As can readily beappreciated, an advantage of utilizing source graphs to process sourcesfor input to a spatial encoder is that additional source graphs can bedeveloped to support additional formats as appropriate to therequirements of specific applications.

A variety of spatial encoders can be utilized in sound servers similarto the sound server shown in FIG. 49 . Furthermore, a specific cell mayinclude a number of different spatial encoders that can be utilizedbased upon factors including (but limited to) any one or more of: thetype of audio source, the number of cells, and/or the placement ofcells. For example, the spatial encoding utilized can vary dependingupon whether the cells are grouped in a configuration in which multiplecells are substantially on the same plane and in a second configurationwhen the group of cells also includes at least one cell mounted overhead(e.g. ceiling mounted).

A spatial encoder that can be utilized to encode a mono source in any ofthe sound servers described herein in accordance with an embodiment ofthe invention is conceptually illustrated in FIG. 50 . The spatialencoder 5000 accepts as an input and individual mono audio object andinformation concerning the location of the audio object. In manyembodiments, the location information can be expressed in Cartesianand/or radial coordinates relative to a system origin in 2D or 3D. Thespatial encoder 5000 utilizes a distance encoder 5002 to encode togenerate signals used to represent the direct and diffuse audiogenerated by the audio object. In the illustrated embodiment, a firstambisonic encoder 5004 is utilized to generate a higher order ambisonicrepresentation (e.g. a second order ambisonic and/or sound fieldrepresentation) of the direct audio generated by the audio object. Inaddition, a second ambisonic encoder 5006 is utilized to generate ahigher order ambisonic representation of the diffuse audio (e.g. asecond order ambisonic and/or sound field representation). A firstambisonic decoder 5008 decodes the higher order ambisonic representationof the direct audio into audio inputs for a set of virtual speakers. Asecond ambisonic decoder 5010 decodes the higher order ambisonicrepresentation of the diffuse audio into audio inputs for the set ofvirtual speakers. While the spatial encoder described with respect toFIG. 50 utilizes higher order ambisonic representations of the directand diffuse audio, spatial encoders can also use representations such as(but not limited to) a VBAP representation, a DBAP representation,and/or a KNN panning representation.

As can be appreciated from the source encoder illustrated in FIG. 51 , asource that is ambisonically encoded in a format that is compatible withthe source encoder does not require separate ambisonic encoding.Instead, the source encoder 5100 can utilize a distance encoder 5102 todetermine direct and diffuse audio for the ambisonic content. Theambisonic representations of the direct and diffuse audio can then bedecoded to provide audio inputs for a set of virtual speakers. In theillustrated embodiment, a first ambisonic decoder 5104 decodes theambisonic representation of the direct audio into inputs for a set ofvirtual speakers and a second ambisonic decoder 5106 decodes theambisonic representation of the diffuse audio into inputs for the set ofvirtual speakers. While the discussion source encoders above withrespect to FIG. 51 references ambisonic encodings, any of a variety ofrepresentations of spatial audio can be similarly decoded into directand/or diffuse inputs for a set of virtual speakers as appropriate tothe requirements of specific applications in accordance with variousembodiments of the invention.

As noted above, virtual speaker audio inputs can be directly decoded toprovide feed signals for one or more sets of one or more drivers. Inmany embodiments, each set of drivers is oriented in a differentdirection and the virtual speaker audio inputs are utilized to generatean ambisonic, or other appropriate spatial representation, of the soundfield generated by the cell. The spatial representation of the soundfield generated by the cell can then be utilized to decode feed signalsfor each set of drivers. The following section discusses variousembodiments of cells including a cell that has three horns distributedaround the perimeter of the hell that are fed by mid and tweeterdrivers. The cell also includes a pair of opposing woofers. A graph forgenerating individual driver feeds based upon three audio signalscorresponding to feeds for each of the set of drivers associated witheach of the horns is illustrated in FIG. 52 . In the illustratedembodiment, the graph 5200 generates drivers for each of the tweetersand mids (six total) and the two woofers. The bass portions of each ofthe three feed signals is combined and low pass filtered 5202 to producea bass signal to drive the woofers. In the illustrated embodiment,sub-processing is separately performed 5204, 5206 for each of the topand bottom sub-woofers and the resulting signals are provided to alimiter 5208 to ensure that the resulting signals will not cause damageto the drivers. Each of the feed signals is separately processed withrespect to the higher frequency portions of the signal. Themid-frequencies and the high frequencies are separated using a set offrequencies 5210, 5212, and 5214, and the signals are provided tolimiters 5216 to generate the 6 driver signals for the mid and tweeterdrivers in each of the three horns. While a specific graph is shown inFIG. 52 , any of a variety of graphs can be utilized as appropriate tothe specific drivers utilized within a cell based upon separate feedsignals for each set of drivers. In a number of embodiments, a separatelow frequency feed can be provided to the cell that is used to drive thesub-woofers. In certain embodiments, the same low frequency feed isprovided to all cells within a group. As can readily be appreciated, thespecific feeds and particular manner in which a cell implements a graphto generate driver feeds are largely dependent upon the requirements ofspecific applications in accordance with various embodiments of theinvention.

While various nested architectures employing a variety of spatial audioencoding techniques are described above, any of a number of spatialaudio reproduction processes including (but not limited to) distributedspatial audio reproduction processes and/or spatial audio reproductionprocesses that utilize virtual speaker layouts to determine the mannerin which to render spatial audio can be utilized as appropriate to therequirements of different applications in accordance with variousembodiments of the invention. Furthermore, a number of different spatiallocation metadata formats and components are described above. It shouldbe readily appreciated that the spatial layout metadata generated anddistributed within a spatial audio system is not in any way limited tospecific pieces of data and/or specific formats. The components and/orencoding of spatial layout metadata largely is largely dependent uponthe requirements of a given application. Accordingly, it should beappreciated that any of the above nested architectures and/or spatialencoding techniques can be utilized in combination and are not limitedto specific combinations. Furthermore, specific techniques can beutilized in processes other than those specifically disclosed herein inaccordance with certain embodiments of the invention.

Much of the above discussion talks generally regarding thecharacteristics of the many variants of cells that can be utilizedwithin spatial audio systems in accordance with various embodiments ofthe invention. However, a number of cell configurations have specificadvantages when utilized in spatial audio systems. Accordingly, adiscussion of several different techniques for constructing cells foruse in spatial audio systems in accordance with various embodiments ofthe invention are discussed further below.

SECTION 5: Distribution of Audio Data within a Spatial Audio System

As noted above, multiple cells can be used to render spatial audio. Achallenge for a multi-cell configurations is managing the flow of databetween cells. For example, audio must be rendered in a synchronizedfashion in order to prevent an unpleasant listening experience. In orderto provide a seamless, quality listening experience, cells canautomatically form hierarchies to promote efficient data flow. Audiodata for rendering spatial audio is carried between cells, but otherdata can be carried as well. For example, control information, positioninformation, calibration information, as well as any other desiredmessaging between cells and control servers can be carried between cellsas appropriate to the requirements of specific applications ofembodiments of the invention.

Depending on the needs of the particular situation, differenthierarchies for data transmission between cells can be established. Inmany embodiments, a primary cell is responsible for managing the flow ofdata, as well as the processing of input audio streams into audiostreams for respective connected secondary cells managed by the primarycell. In numerous embodiments, multiple primary cells communicate witheach other to synchronously manage multiple sets of secondary cells. Invarious embodiments, one or more primary cells can be designated as asuper primary cell, which in turn controls data flow between primaries.

An example hierarchy with a super primary in accordance with anembodiment of the invention is illustrated in FIG. 53 . As can be seen,a super primary cell (SP) obtains an audio stream from a wirelessrouter. The super primary disburses the audio stream to connectedprimary cells (P) over a wireless network established between the cells.Each primary cell in turn processes the audio stream to createindividual streams for the secondary cells that they govern as discussedabove. These streams can be unicast to their destination secondary cell.Further, the super primary cell can perform all the actions of a primarycell, including generating audio streams for its governed secondarycells.

While the arrows illustrated are one directional, this refers only tothe flow of audio data. All cell types can communicate with each othervia the cell network. For example, if a secondary cell receives an inputcommand such as (but not limited to) pause playback or skip track, thecommand can be propagated across the network up from the secondary cell.Further, primary cells and super primary cells may communicate with eachother to pass metadata, time synchronization signals, and/or any othermessage as appropriate to the requirements of specific applications ofembodiments of the invention. As can readily be appreciated, whileprimary cells in separate rooms are shown, primary cells can be withinthe same room depending on many factors including (but not limited to)size and layout of the room, and groupings of cells. Further, whileclusters of three secondary cells to a primary cell are shown, anynumber of different secondary cells can be governed to a primary cell,including a configuration where a primary has no governed secondarycells.

Furthermore, as illustrated in accordance with an embodiment of theinvention in FIG. 54 , multiple super primary cells can be establishedwhich in turn push audio streams to their respective governed primarycells. In numerous embodiments, super primary cells can communicatebetween each other to control synchronization and share other data. Invarious embodiments, the super primary cells connect via the wirelessrouter. Indeed, in many embodiments, a super primary cell can govern aprimary cell via the wireless router. For example, if the primary cellis too far away to be able to efficiently communicate with the superprimary cell, but is not itself a super primary cell, then it can begoverned via a connection facilitated by the wireless router. Governanceof a primary cell by a super primary cell via a wireless router inaccordance with an embodiment of the invention is illustrated in FIG. 55.

Super primary cells are not a requirement of any hierarchy. In numerousembodiments, a number of primary cells can all directly receive audiostreams from the wireless router (or any other input source). Additionalinformation can be passed via the wireless router as well and/ordirectly between primary cells. A hierarchy with no super primary cellsin accordance with an embodiment of the invention is illustrated in FIG.56 .

While several specific architectures have been illustrated above, as canreadily be appreciated, many different hierarchy layouts can be used,with any number of super primary, primary, and secondary cells dependingon the needs of a particular user. Indeed, in order to support robust,automatic hierarchy generation, cells can negotiate amongst each otherto elect cells for specific roles. A process for electing primaries inaccordance with an embodiment of the invention is illustrated in FIG. 57.

Process 5700 includes initializing (5710) a cell. Initializing a cellrefers to a cell joining a network of cells, but can also refer to alone cell beginning the network. In numerous embodiments, cells can beinitialized more than once, for example, when being moved to a new room,or when powering on, and is not restricted to a “first boot” scenario.If a connection to the Internet is available (5720), the cell cancontact a control server to sync (5730) grouping information and/oranother network connected device from which grouping information can beobtained. Grouping information can include (but is not limited to)information regarding the placement of other cells and their groupings(e.g. which cells are in which groups and/or zones. If another primarycell is advertised (5740) on the network, then the newly initializedcell becomes (5750) a secondary cell. However, if there are no primarycells advertised (5740) on the network, the newly initialized cellbecomes (5760) a primary cell.

In order to discover the most efficient role for each cell across thenetwork, the new primary cell publishes (5770) election criteria forbecoming the new primary. In many embodiments, the election criteriaincludes metrics regarding the performance of the current primary suchas (but not limited to) operating temperature, available bandwidth,physical location and/or proximity to other cells, channel conditions,reliability of connection to the Internet, connection quality tosecondary cells, and/or any other metric related to the operationefficiency of a cell to perform the primary role as appropriate to therequirements of specific applications of embodiments of the invention.In many embodiments, the metrics are not all weighted equally, with somemetrics being more important than others. In various embodiments, thepublished election criteria includes a threshold score based on themetrics, which if beaten, would signify a cell better suited to be aprimary cell. If an election is made (5780) for a change in primary cellbased on the published election criteria, then the primary cell migrates(5790) the role of primary to the elected cell, and becomes a secondarycell (5750). If no new cell is elected (5780), the primary cellmaintains its role.

In various embodiments, the election process is periodically repeated tomaintain an efficient network hierarchy. In numerous embodiments, theelection process can be triggered by events such as (but not limited to)initialization of new cells, indication that the primary cell isincapable of maintaining primary role performance, cells dropping fromthe network (due to powering down, signal interruption, cell failure,wireless router failure, etc.), physical relocation of a cell, presenceof a new wireless network, or any of a number of other triggers asappropriate to the requirements of specific applications of embodimentsof the invention. While a specific election process is illustrated inFIG. 57 , it can be readily appreciated that any number of variations ofelection processes can be utilized, including variants that elect superprimary cells, without departing from the scope or spirit of theinvention.

SECTION 6: Construction of Cells

As noted above, cells in accordance with many embodiments of theinvention are speakers capable of modifying a sound field withrelatively equal precision across a 360° area surrounding the cell. Inmany embodiments, cells contain at least one halo containing a radiallysymmetrical arrangement of drivers. In numerous embodiments, each horncontains at least one tweeter and at least one mid. In a variety ofembodiments, each horn contains a tweeter and a mid, coaxially alignedsuch that the tweeter is positioned exterior to the mid relative to themidpoint of the cell. However, halos can contain multiple tweeters and mids so long as the overall arrangement maintains radial symmetry foreach driver type. Various driver arrangements are discussed furtherbelow. In many embodiments, each cell contains an upward-firing wooferand a downward-firing woofer coaxially aligned. However, severalembodiments utilize only one woofer. A significant problem in manyembodiments is that a stand for holding the cell may be required to gothrough one of the woofers. In order to address this structural issue,one of the woofers can have an open channel through the center of thedriver to accommodate wiring and other connectors. In a number ofembodiments, the woofers are symmetrical and both include a channelthrough the center of the driver. Particular woofer construction toaddress this unusual concern is discussed below.

Turning now to FIG. 18A a cell in accordance with an embodiment of theinvention is illustrated. The cell 1800 includes a halo 1810, a core1820, a support structure (referred to as a “crown”) 1830, and lungs1840. In many embodiments, the lungs constitute the exterior shell ofthe cell, and provide a sealed back enclosure for the woofers. The crownprovides support and a seal for the woofer, and in many embodimentsprovides support to the lungs. The halo includes three horns positionedin a radially symmetric manner, and in many embodiments, includesapertures for microphones positioned between the horns. Each of thesecomponents is discussed in further detail from the inside out to providean overview of both form and construction.

SECTION 6.1: Halos

Halos are rings of horns with seated drivers. In numerous embodiments,halos radially symmetrical and can be manufactured to promote modalbeamforming. However, beamforming can be accomplished with halos thatare asymmetric and/or have different size and/or placements of horns.While there are many different arrangements of horns that would satisfythe function of a halo, the primary discussion of halos below is withrespect to a three-horned halo. However, halos containing multiple hornscan be utilized in accordance with many embodiments of the invention inorder to provide different degrees of beam control. The horns caninclude multiple input apertures as well as structural acousticcomponents to assist with controlling sound dispersal. In manyembodiments, the halo also contains apertures and/or support structuresfor microphones.

Turning now to FIG. 18B, a halo in accordance with an embodiment of theinvention is illustrated. Halo 1810 includes three horns 1811. Each horncontains three apertures 1812. The Halo further includes a set of threemicrophone apertures 1813 (two visible, one occluded in the providedview of the embodiment). A cross sectional view of the microphoneaperture showing the housing for the microphone in accordance with anembodiment of the invention is illustrated in FIG. 18C. In manyembodiments, the Halo is manufactured as a complete object via a 3Dprinting process. However, Halos can be constructed piecewise. Innumerous embodiments, the three horns are oriented 120° apart such thatthey have threefold radial symmetry (or “trilateral symmetry”).

In numerous embodiments, each horn is connected to a tweeter and a middriver. In many embodiments, the tweeter is exterior to the mid relativeto the center point of the halo, and the two drivers are coaxiallypositioned. FIG. 18D illustrates an exploded view of a coaxial alignmentof the tweeter and mid for a single horn of a halo in accordance with anembodiment of the invention. Tweeter 1814 is positioned exterior to themid 1815. FIG. 18E illustrates a socketed set of tweeter/mid drivers foreach horn in a halo in accordance with an embodiment of the invention.

In numerous embodiments, the tweeter is fitted into the center apertureof the horn, whereas the mid is configured to direct sound through theouter two apertures of the halo. Turning now to FIG. 18F, a horizontalcross section of a socketed set of tweeter/mid drivers for each horn ina halo is illustrated in accordance with an embodiment of the invention.As shown, the apertures can be utilized to provide additional separationof different frequencies generated by the driver. Further, the hornitself can include an acoustic structure 1816 in order to avoid internalmultipath reflections. In many embodiments, the acoustic structure is aperforated grid. In some embodiments, the acoustic structure is a porousfoam. In a number of embodiments, the acoustic structure is a lattice.The acoustic structure can prevent the passage of highs while admittingmids. In many embodiments, the acoustic structure assists in maintainingthe directionality of the sound waves. In a variety of embodiments, thehorns are constructed in such a way as to minimize the amount of sounddispersal outside of the 120° sector of the horn. In this way, eachindividual horn of the halo is primarily responsible for the cell'ssound reproduction within a discreet 120° sector.

The microphone array situated in a halo can be used for multiplepurposes, many of which will be discussed in further detail below. Amongtheir many uses, the microphones can be used in conjunction with thedirectional capabilities of the cell to measure the environment viaacoustic ranging. In many embodiments, the halo itself often abuts acore component. A discussion of the core component is found below.

SECTION 6.2: The Core

Cells can utilize logic circuity in order to process audio informationand perform other computational processes, including, but not limitedto, controlling drivers, directing playback, acquiring data, performingacoustic ranging, responding to commands, and managing network traffic.This logic circuitry can be contained on a circuit board. In manyembodiments, the circuit board is an annulus. The circuit board may bemade up of multiple annulus sector pieces. However, the circuit boardcan also take other shapes. In many embodiments, the center of theannulus is at least partially occupied by a roughly spherical housing(the “core housing”) that provides a back volume for the driversconnected to the halo. In numerous embodiments, the core housingincludes two interlocking components.

A circuit board annulus and the bottom portion of the housing inaccordance with an embodiment of the invention is illustrated in FIG.18G. In the illustrated embodiment, the circuit board accompanies a setof pins to which various other components of the cell are mounted. Inother embodiments, the circuit board is split into two or more separateannulus sectors. In a variety of embodiments, each sector is responsiblefor a different functional purpose. For example, in many embodiments,one sector is responsible for power supply, one sector is responsiblefor driving the drivers, and one sector is responsible for generic logicprocessing tasks. However, the functionality of the sectors or thecircuit board in general is not restricted to any particular physicallayout.

Turning now to FIG. 18H, a core section surrounded by a halo and driversin accordance with an embodiment of the invention is illustrated. Thecore is shown with both the top and bottom housing components. In manyembodiments, the housing components of the core are divided into threedistinct volumes, each providing a separate back volume for the set ofdrivers associated with a particular horn in the halo. In a variety ofembodiments, the core housing includes three divider walls that meet atthe center of the core housing. While the core housing illustrated inFIG. 18H is roughly spherical, the core housing can be any shape asappropriate to the requirements of specific applications in accordancewith various embodiments of the invention. Further, gaskets and/or othersealant methods can be used to form seals in order to prevent airmovement between different sections. In many embodiments, surroundingthe core and halo is the crown. Crowns are discussed below.

SECTION 6.3: The Crown

In many embodiments, as discussed above, cells include a pair ofopposing, coaxial woofers. The crown can be a set of struts whichsupport the woofers. In many embodiments, the crown is made of a topcomponent and a bottom component. In numerous embodiments, the topcomponent and bottom component are a single component that protrudesfrom both sides of the halo. In other embodiments, the top and bottomcomponents can be separate pieces.

A crown positioned around a halo and core in accordance with anembodiment of the invention is illustrated in FIG. 18I. The crown mayhave “windows” or other cutouts in order to reduce weight and/or provideaesthetically pleasing designs. The crown may have gaskets and/or otherseals to prevent air from escaping into other volumes within the cell.In the illustrated embodiment, the crown is surrounded by the lungs,which are discussed in further detail below.

SECTION 6.4: The Lungs

In many embodiments, the outer surface of the cell is the lungs. Thelungs can provide many functions, including, but not limited to,providing a sealed back volume for the woofers, and protecting theinterior of the cell. However, in numerous embodiments, additionalcomponents can be exterior to the lungs for either cosmetic orfunctional effect (e.g. connectors, stands, or any other function asappropriate to the requirements of specific applications in accordancewith various embodiments of the invention). In numerous embodiments, thelungs are transparent, and enable a user to see inside the cell.However, the lungs can be opaque without impairing the functionality ofthe cell.

Turning now to FIG. 18J, a cell with lungs surrounding a crown, core,and halo in accordance with an embodiment of the invention isillustrated. Apertures can be provided in the lungs on the top andbottom of the cell to enable placement of woofers. A coaxial arrangementof woofers designed to fit into the apertures in accordance with anembodiment of the invention can be found in FIGS. 18K and 18L, whichillustrate the top and bottom woofers, respectively. As can be seen, thetop woofer is a conventional woofer, whereas the bottom woofer containsa hollow tunnel through the center. This is further illustrated in thecross sectional views of the top and bottom woofer illustratedrespectively in FIGS. 18M and 18N. The channel through the bottom woofercan provide an access port for physical connectors to reach the exteriorof the cell. In many embodiments, a “stem” extends from the cell throughthe channel which can connect to any number of different configurationsof stands. In a variety of embodiments, power cabling and data transfercabling are routed through the channel. A cell with a stem going throughthe channel is illustrated in accordance with an embodiment of theinvention in FIG. 18O. A close up view of various ports on a stem inaccordance with an embodiment of the invention is illustrated in FIG.18P. Ports can include, but are not limited to, USB connectors, powerconnectors, and/or any other connector implemented in accordance with adata transfer connection protocol and/or standard as appropriate to therequirements of specific applications in accordance with variousembodiments of the invention.

In order to maintain woofer functionality, a double surround can be usedto keep the channel 1820 open while keeping the woofer sealed. Further,in many embodiments, a gasket used to seal the bottom woofer can beextended to cover the frame to reinforce the seal. However, in manyembodiments, a cell may only have a single woofer. Due to the nature oflow frequency sound, many spatial audio renderings may not requireopposing woofers. In such a case, a channel may not be required as thebottom (or top) may not have a woofer. Further, in many embodiments,additional structural elements can be utilized on the exterior of thecell that provide alternative connections to stands, or may in fact bestands themselves. In such a case where a stem is not connected throughthe bottom of the cell, a conventional woofer could be used instead. Inmany embodiments, the diaphragm (or cone) of the woofer is constructedout of a triaxial carbon fiber weave which has a high stiffness toweight ratio. However, the diaphragm can be constructed out of anymaterial appropriate for a woofer as appropriate to the requirements ofspecific applications of embodiments of the invention. Further, innumerous embodiments, a cell can be made to be completely sealed with noexternal ports by use of induction-based power systems and wireless dataconnectivity. However, a cell can retain these functions while stillproviding physical ports. Stems are discussed in further detail below.

Section 6.5: Stems

As noted above, in numerous embodiments, cells include stems which canserve any of a number of functions, including, but not limited to,supporting the body of the cell, providing a surface for placingcontrols, providing connections to stands, providing a location forconnectors, and/or any of a number of other functions as appropriate tothe requirements of specific applications of embodiments of theinvention. Indeed, while in many embodiments, cells can be operatedremotely via control devices, in various embodiments, cells can beoperated directly via physical controls connected to the cell such as,but not limited to, buttons, toggles, dials, switches, and/or any otherphysical control method as appropriate to the requirements of specificapplications of embodiments of the invention. In numerous embodiments, a“control ring” located on the stem can be used to directly control thecell.

Turning now to FIG. 20 , a control ring on a stem is illustrated inaccordance with an embodiment of the invention. Control rings are ringsthat can be manipulated to send control signals to a cell, similar to acontrol device. Control rings can be rotated (e.g. twisted), pulled up,pushed down, pushed (e.g. “clicked,” or pressed perpendicularly to theaxis of the stem), and/or any other manipulation as appropriate to therequirements of specific applications of embodiments of the invention. Across section of an example control ring showing the interior mechanicsin accordance with an embodiment of the invention is illustrated in FIG.21 . Different mechanical components are discussed below with respect tothe actions with which they are associated.

In numerous embodiments, rotating can be used as a method of control.While rotating can indicate a number of different controls asappropriate to the requirements of specific applications of embodimentsof the invention, in many embodiments, the rotating motion can be usedto change volume and/or skip tracks. FIG. 22 indicates the mechanicalstructures involved with registering a rotation of the control ring inaccordance with an embodiment of the invention. FIG. 23 is a close viewof the particular component. A disk containing an alternating sensiblesurface is connected to the ring, which when rotated, moves thealternating sensible surface across a sensor. The rotation can be sensedby the sensor by measuring the alternating surface. In numerousembodiments, the alternating sensible surface is made of magnets, andthe sensor detects the changing magnetic field. In various embodiments,the alternating sensible surface is an alternating colored surface whichis sensed via an optical sensor. However, any number of differentsensing schemes can be utilized as appropriate to the requirements ofspecific applications of embodiments of the invention. Furthermore, innumerous embodiments, the alternating sensible surface is an annulusrather than a disk.

In a variety of embodiments, forcing a control ring off center, or“clicking,” can be used as a method of control. FIG. 24 illustrates“clicking” a control ring in accordance with an embodiment of theinvention. In many embodiments, a radial push is resisted by racesprings while a static ramp engages a conical washer (also referred toas a “Belleville washer”) causing it to invert, which is then detected.In several embodiments, when the washer inverts, a ring of carbon pillmaterial presses against an electrode pattern and shorts two contactrings. The short can be measured and recorded as a click. A carbon pillmembrane with associated electrodes under a conical washer in aninverted “clicked” position in accordance with an embodiment of theinvention is illustrated in FIG. 25 . However, any number of differentdetection methods can be used as appropriate to the requirements ofspecific applications of embodiments of the invention.

In many embodiments, moving the control ring vertically along the stemcan be used as a method of control. An example mechanical structure forregistering vertical movement in accordance with an embodiment of theinvention is illustrated in FIG. 26 . In a number of embodiments, thevertical movement of the control ring can be measured by revealing aflag which can in turn be detected via an opto-interrupter. In manyembodiments, a proximity sensor is used instead of, or in conjunctionwith, an opto-interrupter. An illustration of the space created forrevealing a flag in accordance with an embodiment of the invention isillustrated in FIG. 27 . In a variety of embodiments, the movement canbe detected mechanically via a physical switch or circuit short such aswith respect to a click. One of ordinary skill in the art can appreciatethat there are any number of ways to detect movement as appropriate tothe requirements of specific applications of embodiments of theinvention.

Once a control ring has been moved from its resting position via avertical movement, a rotation on the new plane can be used as adifferent control than rotation on the resting plane. In manyembodiments, a rotation on the second plane is referred to as a “twist,”and is detected when the rotation achieves a set angle. In manyembodiments, a clutch is engaged when the control ring is moved to asecond plane, and can be moved relative to a separate clutch plate. In avariety of embodiments, a torsion spring can be used to resist motionwhile an integrated detent spring can provide a detent at the end oftravel to enhance feel and/or prevent accidental movement. For example,a twist of 120 degrees (or any arbitrary number of degrees), can beregistered using snap done switches at the end of a track. An exampleconfiguration of a clutch body and clutch plate in accordance with anembodiment of the invention is illustrated in FIG. 28 . However, anynumber of different rotation methods can be used as appropriate to therequirements of specific applications of embodiments of the invention.An advantage to the discussed mechanisms is that they can be implementedwith a passage in the middle to accommodate components that may passthrough the stem.

Stems further can lock into stands. In numerous embodiments, a bayonetbased locking system is used, where a bayonet located on the stemtravels into a housing in the stand to fix the connection. An examplebayonet locking system in accordance with an embodiment of the inventionis illustrated in FIG. 29 . As illustrated, the stem has severalbayonets that are pointed on one side, and the stand has a track formedby two surfaces which form bayonet shaped housings at the end of atrack. In many embodiments, the number of bayonets match the number ofhousings, however so long as at least one bayonet matches to a housing,and no other bayonets (if present) collide with the surfaces such thatthe connection is off balance, the connection can be stable. If the stemand the stand are not aligned such that the bayonets can drop into thetrack, the stand or stem can be rotated such that they all fall into thetrack. In various embodiments, when twisted, the pointed end of thebayonet pushes open the two surfaces to reach and drop into the housing,after which the two surfaces can be forced together via springs in orderto close the track. This can lock the stem into the stand, and preventunwanted motion or removal under normal forces. A cross section of astand and stem locked together using a bayonet based locking system inaccordance with an embodiment of the invention is illustrated in FIG. 30.

In order to remove the stem from the stand, the two surfaces can beseparated again to form a track which the bayonets can be backed out ofand removed. In various embodiments, one of the surfaces can be pushedup or down. In many embodiments, this is achieved using a set of loadedsprings which are manipulable by a user. An example implementation isillustrated in accordance with an embodiment of the invention in FIGS.31A and 31B. Positional bi-stability can be achieved using springs on alock plate engaged with a tab. By sliding a plate, the user can move oneof the surfaces by applying the appropriate force against the springs.FIG. 31A shows the mechanism in the locked position, whereas FIG. 31Bshows the mechanism in the unlocked position. However, one of ordinaryskill in the art can appreciate that any number of configurations can beutilized for bayonet based locking systems as appropriate to therequirements of specific applications of embodiments of the invention.Indeed, one of ordinary skill in the art can appreciate that any numberof locking systems can be used aside from bayonet based locking systemsto secure stems to stands without departing from the scope or spirit ofthe invention.

Putting together the above described components can yield a functionalcell. Turning now to FIGS. 18Q and 18R, FIG. 18Q is a cross section of acomplete cell and FIG. 18R is an exploded view of the complete cell inaccordance with an embodiment of the invention. While a particularembodiment of a cell is illustrated with respect to FIGS. 18A-R, cellscan take any number of different configurations, including, but notlimited to, having different numbers of drivers, different hornconfigurations, replacing horns with other driver configurationsincluding (but not limited to) tetrahedral driver configurations,lacking a stem, and/or different overall form factors. In manyembodiments, cells are supported by support structures. A non-exclusiveset of example support structures in accordance with embodiments of theinvention are illustrated in FIGS. 19A-D.

SECTION 6.6: Cell Circuitry

Turning now to FIG. 32 , a block diagram for cell circuitry inaccordance with an embodiment of the invention is illustrated. Cell 3200includes processing circuitry 3210. Processing circuitry can include anynumber of different logic processing circuits such as, but not limitedto, processors, microprocessors, central processing units, parallelprocessing units, graphics processing units, application specificintegrated circuits, field-programmable gate-arrays, and/or any otherprocessing circuitry capable of performing spatial audio processes asappropriate to the requirements of specific applications in accordancewith various embodiments of the invention.

Cell 3200 can further include an input/output (I/O) interface 3220. Inmany embodiments, the I/O interface includes a variety of differentports and can communicate using a variety of different methodologies. Innumerous embodiments, the I/O interface includes a wireless networkingdevice capable of establishing an ad hoc network and/or connecting toother wireless networking access points. In a variety of embodiments,the I/O interface has physical ports for establishing wired connections.However, I/O interfaces can include any number of different types oftechnologies capable of transferring data between devices. Cell 3200further includes clock circuitry 3230. In many embodiments, the clockcircuitry includes a quartz oscillator.

Cell 3200 can further include driver signal circuitry 3235. Driversignal circuitry is any circuitry capable of providing an audio signalto a driver in order to make the driver produce audio. In manyembodiments, each driver has its own portion of the driver circuitry.

Cell 3200 can also include a memory 3240. Memory can be volatile memory,non-volatile memory, or a combination of volatile and non-volatilememory. Memory 3240 can store an audio player application such as (butnot limited to) a spatial audio rendering application 3242. In numerousembodiments, spatial audio rendering applications can direct theprocessing circuitry to perform various spatial audio rendering taskssuch as, but not limited to, those described herein. In numerousembodiments, the memory further includes map data 3244. Map data candescribe the location of various cells within a space, the location ofwalls, floors, ceilings, and other barriers and/or objects in the space,and/or the placement of virtual speakers. In many embodiments, multiplesets of map data may be utilized in order to compartmentalize differentpieces of information. In a variety of embodiments, the memory 3240 alsoincludes audio data 3246. Audio data can include one or more pieces ofaudio content that can contain any number of different audio tracksand/or channels. In a variety of embodiments, audio data can includemetadata describing the audio tracks such as, but not limited to,channel information, content information, genre information, trackimportance information, and/or any other metadata that can describe anaudio track as appropriate to the requirements of specific applicationsin accordance with various embodiments of the invention. In manyembodiments, audio tracks are mixed in accordance with an audio format.However, audio tracks can also represent individual, unmixed channels.

Memory can further include sound object position data 3248. Sound objectposition data describes the desired location of a sound object in thespace. In some embodiments, sound objects are located at the position ofeach speaker in a conventional speaker arrangement ideal for the audiodata. However, sound objects can be designated for any number ofdifferent audio tracks and/or channels and can be similarly located atany desired point.

FIG. 33 illustrates an example of a hardware implementation for anapparatus 3300 employing a processing system 3320 that may be used toimplement a cell configured in accordance with various aspect of thedisclosure for the system and architecture for spatial audio control andreproduction. In accordance with various aspects of the disclosure, anelement, or any portion of an element, or any combination of elements inthe apparatus 3300 that may be used to implement any device, including acell, may utilize the spatial audio and approach described herein.

The apparatus 3300 may be used to implement a cell. The apparatus 3300includes a set of spatial audio control and production modules 3310 thatincludes a system encoder 3312, a system decoder 3332, a cell encoder3352, and a cell decoder 3372. The apparatus 3300 can also include a setof drivers 3392. The set of drivers 3392 may include one or more subsetsof drivers that include one or more of different types of drivers. Thedrivers 3392 can be driven by driver circuitry 3390 that generates theelectrical audio signals for each of the drivers. The driver circuitry3390 may include any bandpass or crossover circuits that may divideaudio signals for different types of drivers.

In various aspects of the disclosure, as illustrated by the apparatus3300, each cell may include a system encoder and a system decoder suchthat system-level functionality and processing of related informationmay be distributed over the group of cells. This distributedarchitecture can also minimize the amount of data that needs to betransferred between each of the cells. In other implementations, eachcell may only include a cell encoder and a cell decoder, but not asystem encoder nor a system decoder. In various embodiments, secondarycells only utilize their cell encoder and cell decoder.

The processing system 3320 can include one or more processorsillustrated as a processor 3314. Examples of processors 3314 can include(but is not limited to) microprocessors, microcontrollers, digitalsignal processors (DSPs), field programmable gate arrays (FPGAs),programmable logic devices (PLDs), state machines, gated logic, discretehardware circuits, and/or other suitable hardware configured to performthe various functionality described throughout this disclosure.

The apparatus 3300 may be implemented as having a bus architecture,represented generally by a bus 3322. The bus 3322 may include any numberof interconnecting buses and/or bridges depending on the specificapplication of the apparatus 3302 and overall design constraints. Thebus 3322 can link together various circuits including the processingsystem 3320, which can include the one or more processors (representedgenerally by the processor 3314) and a memory 3318, andcomputer-readable media (represented generally by a computer-readablemedium 3316). The bus 3322 may also link various other circuits such astiming sources, peripherals, voltage regulators, and/or power managementcircuits, which are well known in the art, and therefore, will not bedescribed any further. A bus interface (not shown) can provide aninterface between the bus 3322 and a network adapter 3342. The networkadapter 3342 provides a means for communicating with various otherapparatus over a transmission medium. Depending upon the nature of theapparatus, a user interface (e.g., keypad, display, speaker, microphone,joystick) may also be provided.

The processor 3314 is responsible for managing the bus 3322 and generalprocessing, including execution of software that may be stored on thecomputer-readable medium 3316 or the memory 3318. The software, whenexecuted by the processor 3314, can cause the apparatus 3300 to performthe various functions described herein for any particular apparatus.Software shall be construed broadly to mean instructions, instructionsets, code, code segments, program code, programs, subprograms, softwaremodules, applications, software applications, software packages,routines, subroutines, objects, executables, threads of execution,procedures, functions, etc., whether referred to as software, firmware,middleware, microcode, hardware description language, or otherwise.

The computer-readable medium 3316 or the memory 3318 may also be usedfor storing data that is manipulated by the processor 3314 whenexecuting software. The computer-readable medium 3316 may be anon-transitory computer-readable medium such as a computer-readablestorage medium. A non-transitory computer-readable medium includes, byway of example, a magnetic storage device (e.g., hard disk, floppy disk,magnetic strip), an optical disk (e.g., a compact disc (CD) or a digitalversatile disc (DVD)), a smart card, a flash memory device (e.g., acard, a stick, or a key drive), a random access memory (RAM), a readonly memory (ROM), a programmable ROM (PROM), an erasable PROM (EPROM),an electrically erasable PROM (EEPROM), a register, a removable disk,and any other suitable medium for storing software and/or instructionsthat may be accessed and read by a computer. The computer-readablemedium may also include, by way of example, a carrier wave, atransmission line, and any other suitable medium for transmittingsoftware and/or instructions that may be accessed and read by acomputer. Although illustrated as residing in the apparatus 3300, thecomputer-readable medium 3316 may reside externally to the apparatus3300, or be distributed across multiple entities including the apparatus3300. The computer-readable medium 3316 may be embodied in a computerprogram product. By way of example, a computer program product mayinclude a computer-readable medium in packaging materials. Those skilledin the art will recognize how best to implement the describedfunctionality presented throughout this disclosure depending on theparticular application and the overall design constraints imposed on theoverall system.

FIG. 34 illustrates the source manager 3400 configured in accordancewith various aspects of the disclosure that receives the multimediainput 3402. The multimedia input 3402 may include multimedia content3412, multimedia metadata 3414, sensor data 3416, and/or preset/historyinformation 3418. The source manager 3400 can also receive userinteraction 3404 that may directly manage playback of the multimediacontent 3412, including affecting selection of a source of multimediacontent and managing rendering of that source of multimedia content. Asfurther discussed herein, the multimedia content 3412, the multimediametadata 3414, the sensor data 3416, and the preset/history information3418 may be used by the source manager 3400 to generate and managecontent 3448 and rendering information 3450.

The multimedia content 3412 and the multimedia metadata 3414 relatedthereto may be referred to herein as “multimedia data.” The sourcemanager 3400 includes a source selector 3422 and a source preprocessor3424 that may be used by the source manager 3400 to select one or moresources in the multimedia data and perform any preprocessing to provideas the content 3448. The content 3448 is provided to the multimediarendering engine along with the rendering information 3450 generated bythe other components of the source manager 3400, as described herein.

The multimedia content 3412 and the multimedia metadata 3414 may bemultimedia data from such sources as High-Definition MultimediaInterface (HDMI), Universal Serial Bus (USB), analog interfaces(phono/RCA plugs, stereo/headphone/headset plugs), as well as streamingsources using the Airplay protocol developed by Apple Inc. or theChromecast protocol developed by Google. In general, these sources mayprovide sound information in a variety of content and formats, includingchannel-based sound information (e.g., Dolby Digital, Dolby DigitalPlus, and Dolby Atmos, as developed by Dolby Laboratories, Inc.),discrete sound objects, sound fields, etc. Other multimedia data caninclude text-to-speech (TTS) or alarm sounds generated by a connecteddevice or another module within the spatial multimedia reproductionsystem (not shown).

The source manager 3400 further includes an enumeration determinator3442, a position manager 3444, and an interaction manager 3446.Together, these components can be used to generate the renderinginformation 3450 that is provided to the multimedia rendering engine. Asfurther described herein, the sensor data 3416 and the preset/historyinformation 3418, which may be referred to generally as “control data,”may be used by these modules to affect playback of the multimediacontent 3412 by providing the rendering information 3450 to themultimedia rendering engine. In one aspect of the disclosure, therendering information 3450 contains telemetry and control information asto how the multimedia rendering engine should playback the multimedia inthe content 3448. Thus, the rendering information 3450 may specificallydirect how the multimedia rendering engine is to reproduce the content3448 received from the source manager 3400. In other aspects of thedisclosure, the multimedia rendering engine may make the ultimatedetermination as to how to render the content 3448.

The enumeration determinator module 3442 is responsible for determiningthe number of sources in the multimedia information included in thecontent 3448. This may include multiple channels from a single source,such as, for example, two channels from a stereo sound source, as wellas TTS or alarm/alert sounds such as those that may be generated by thesystem. In one aspect of the disclosure, the number of channels in eachcontent source is part of the determination of the number of sources toproduce the enumeration information. The enumeration information may beused in determining the arrangement and mixing of the sources in thecontent 3448.

The position manager 3444 can manage the arrangement of reproduction ofthe sources in the multimedia information included in the content 3448using a desired position of reproduction for each source. A desiredposition may be based on various factors, including the type of contentbeing played, positional information of the user or an associateddevice, and historical/predicted position information. With reference toFIG. 35 , the position manager 3544 may determine position informationused for rendering multimedia sources based on information from a uservoice input 3512, an object augmented reality (A/R) input 3514, a UIposition input 3516, and last/predicted position information associatedfor a particular input type 3518. The positional information may begenerated in a position determination process using such approaches as asimultaneous localization and mapping (SLAM) algorithm. For example, thedesired position for playback in a room may be based on a determinationof a user's location in the room. This may include detecting the uservoice 3512 or, alternatively, a received signal strength indicator(RSSI) of a user device (e.g., a user's smartphone).

The playback location may be based on the object A/R 3514, which may beinformation for an AR object in a particular rendering for a room. Thus,the playback position of a sound source may match the A/R object. Inaddition, the system may determine where cells are using visualdetection and, through a combination of scene detection and view of theA/R object being rendered, the playback position may be adjustedaccordingly.

The playback position of a sound source may be adjusted based on a userinteracting with a user interface through the UI position input 3516.For example, the user may interact with an app that includes a visualrepresentation of the room in which a sound object is to be reproducedas well as the sound object itself. The user may then move the visualrepresentation of the sound object to position the playback of the soundobject in the room.

The location of playback may also be based on other factors such as thelast playback location of a particular sound source or type of soundsource 3518. In general, the playback location may be based on aprediction based on factors including (but not limited to) type of thecontent, time of day, and/or other heuristic information. For example,the position manager 3544 may initiate playback of an audio book in abedroom because the user plays back the audio book at night, which isthe typical time that the user plays the audio book. As another example,a timer or reminder alarm may be played back in the kitchen if the userrequests a timer be set while the user is in the kitchen.

In general, the position information sources may be classified intoactive or passive sources. Active sources refer to positionalinformational sources provided by a user. These sources may include userlocation and object location. In contrast, passive sources arepositional informational sources that are not actively specified byusers but used by the position manager 3544 to predict playbackposition. These passive sources may include type of content, time ofday, day of the week, and based on heuristic information. In addition, apriority level may be associated with each content source. For example,alarms and alerts may have a higher level of associated priority thanother content sources, which may mean that these are played at highervolumes if they are being played in a position next to other contentsources.

The desired playback location may be dynamically updated as themultimedia is reproduced by the multimedia rendering engine. Forexample, playback of music may “follow” a user around a room by thespatial multimedia reproduction system receiving updated positionalinformation of the user or a device being carried by the user.

An interaction manager 3446 can manage how each of the differentmultimedia sources are to be reproduced based on their interaction witheach other. In accordance with one aspect of the disclosure, playback ofa multimedia source such as a sound source may paused, stopped, orreduced in volume (also referred as “ducked”). For example, where analarm needs to be rendered during playback of an existing multimediasource, such as a song, an interaction manager may pause or duck thesong while the alarm is being played.

SECTION 7: UI/UX and Additional Functionality

Spatial audio systems in accordance with many embodiments of theinvention include user interfaces (UIs) to enable users to interact withand control spatial audio rendering. In several embodiments, a varietyof UI modalities can be provided to enable users to interact withspatial audio systems in various ways including (but not limited to)direct interaction with a cell via buttons, a gesture based UI, and/or avoice activated UI, and/or interaction with an additional device such(as but not limited to) a mobile device or a voice assistant device viabuttons, a gesture based UI, and/or a voice activated UI. In numerousembodiments, UIs can provide access to any number of functionsincluding, but not limited to, controlling playback, mixing audio,placing audio objects in a space, configuring spatial audio systems,and/or any other spatial audio system function as appropriate to therequirements of specific applications. While the below reflect severaldifferent versions of UIs for various functions, one of ordinary skillin the art can appreciate that any number of different UI layouts and/oraffordances can be used to provide users with access to and control overspatial audio system functionality.

Turning now to FIG. 36 , a UI for controlling the placement of soundobjects in a space in accordance with an embodiment of the invention isillustrated. As shown, cells can be graphically represented in theirapproximate location in a virtual space as an analog to the physicalspace. In numerous embodiments, different sound objects can be createdand associated with different audio sources. In the cases of achannel-based audio source a separate audio object can be created fordifferent channels (often with the bass mixed into all channels). Eachspatial audio object can be represented by a different UI object havinga different graphical representation (e.g. color). Indeed, the graphicalrepresentations can be differentiated in any number of ways including,but not limited to, shape, size, animation, symbol, and/or any otherdifferentiating mark as appropriate to the requirements of specificapplications. Sound objects can be moved throughout the virtual spacewhich can result in a perceived “movement” of the sound object in thephysical space when rendered by the spatial audio system using a processsimilar to any of the various spatial audio reproduction processesdescribed above. In many embodiments, moving sound objects can beachieved via a “click-and-drag” operation, however any number ofdifferent interface techniques can be used.

Turning now to FIGS. 37A and 37B, a second UI for controlling theplacement of sound objects in accordance with an embodiment of theinvention is illustrated. The illustrated embodiment demonstrates a UIcapable of enabling the splitting and merging of sound objects. Innumerous embodiments a single sound object can represent more than oneaudio source and/or audio channel. In various embodiments, each audioobject can represent one or more instruments, for example, as in a“master” recording. FIG. 37A demonstrates a sound object that has beenassigned audio tracks for four different instruments, in this casevocals, guitar, cello, and keyboard. Of course, any number of differentinstruments or arbitrary audio tracks can be assigned as appropriate tothe requirements of specific applications in accordance with variousembodiments of the invention. A button and/or other affordance can beprovided to enable the user to “split” the sound object into multiplesound objects which can each reflect one or more of the channels in theoriginal sound object. As seen in FIG. 37B, the sound object is splitinto four separate sound objects which can be independently placed, eachrepresenting a single instrument. A button and/or interface object canbe provided to enable the merging of different sound objects in asimilar manner.

Turning now to FIG. 38 , a UI element for controlling the volume andrendering of sound objects in accordance with an embodiment of theinvention is illustrated. In numerous embodiments, each sound object canbe associated with a volume control. In the illustrated environment,volume sliders are provided. However, any of a number of differentvolume control schemes can be used as appropriate to the requirements ofspecific applications in accordance with various embodiments of theinvention. In several embodiments, a single sound control can beassociated with multiple sound objects. It should be readily appreciatedthat independently controlling sound objects is different fromindependently controlling individual speakers. Controlling the volume ofa single sound object, can impact the manner in which audio is renderedby multiple speakers in a manner determined by a spatial audioreproduction process such as (but not limited to) the various nestedarchitectures described above. In embodiments in which virtual speakersare utilized within a spatial audio reproduction process, buttons can beprovided in order to change between various preset virtual speakerconfigurations impacting number and/or placement of the virtual speakersrelative to cells. In many embodiments, audio control buttons and/oraffordances such as, but not limited to, play, pause, skip, seek, and/orany other sound control can be provided as part of a UI.

Spatial audio objects can further be viewed in an augmented realitymanner. In numerous embodiments, control devices can have augmentedreality capabilities, and sound objects can be visualized. Turning nowto FIG. 39 , a sound object representing an audio track being playedalong with album art in accordance with an embodiment of the inventionis illustrated. However, the track can be represented in any number ofdifferent ways, including those without art, with different shapes,those that are more abstract, and/or any other graphical representationas appropriate to the requirements of specific applications of variousembodiments of the invention. For example, FIG. 40 illustrates threedifferent visualizations of abstract representations of audio objects inaccordance with an embodiment of the invention. As one of ordinary skillin the art can appreciate, there are any number of differentapplications of visually rendering sound objects in an augmented and/orvirtual reality environment that can be implemented in combination withthe rendering of spatial audio by spatial audio systems in accordancewith various embodiments of the invention.

In numerous embodiments, control devices can be used to assist withconfiguration of spatial audio systems. In many embodiments, spatialaudio systems can be used to assist with mapping a space. Turning now toFIG. 41 , an example UI for configuration operation in accordance withan embodiment of the invention is illustrated. In numerous embodiments,control devices have depth-sensing capabilities that can assist withmapping a room. In a variety of embodiments, the camera system of acontrol device can be used to identify individual cells in a space.However, as noted above, it is not a requirement that a control devicehave an integrated camera.

In numerous embodiments, spatial audio systems can be used for musicproduction and/or mixing. Spatial audio systems can be connected todigital and/or physical musical instruments and the output of theinstrument can be associated with a sound object. Turning now to FIG. 42, an integrated digital instrument in accordance with an embodiment ofthe invention is illustrated. In the illustrated example, a drum set hasbeen integrated. In a variety of embodiments, different drums in thedrum set can be associated with different sound objects. In numerousembodiments, multiple drums in the drum set can be associated with thesame sound object. Indeed, more than one instrument can be integrated,and any number of different arbitrary instruments are capable ofintegration.

While different sound objects can be visualized as described above, inmany embodiments, it is desirable to have a holistic visualization ofwhat is being played back. In numerous embodiments, audio streams can bevisualized by processing the audio signal in such a way as to representthe frequencies present at any given time point in the stream. Forexample, audio can be processed using a Fourier transform, or bygenerating a Mel Spectrogram. In many embodiments, primary cells and/orsuper primary cells are responsible for processing the audio stream thatthey are responsible for, and passing the results to the devicepresenting the visualization. The resulting processed audio whichdescribes each frequency and their respective amplitudes at each giventime point can be wrapped into a helix, where the same point on eachturn of a helix offset by one pitch reflect the same note (A, B, C, D,E, F, G and the like) at sequential octaves. In this way, when viewedfrom above (i.e. perpendicular to the axis of the helix), the some notein each octave lines up. A helix as described when viewed from the sideand from above in accordance with an embodiment of the invention areillustrated in FIGS. 58A and 58B, respectively. When a particular noteis played at a given octave, the helix structure can warp based on theamplitude to visualize the note. In numerous embodiments, the warpedsection can leave a transparent field behind it, where different turnsof the helix are represented by different colors, levels oftransparency, and/or any other visual indicator as appropriate to therequirements of specific applications of embodiments of the invention.In this way, multiple notes at different octaves can be simultaneouslyvisualized. An example of a visualization using a helix in accordancewith an embodiment of the invention is illustrated in FIG. 59 .

Further, more than one helix can be generated. For example, eachinstrument in a band playing a song may have their own visualizationhelix. Example visualization helices for multiple instruments in a bandin accordance with an embodiment of the invention are illustrated inFIG. 60 . However, the helix can be used for any number ofvisualizations depending on the desires of the user. Further,visualizations do not have to be helix based.

Helix based visualizations are not the only types of visualizations thatcan be utilized. In a variety of embodiments, visualizations can beattached to sound objects and represented spatially within a visualizedspace reflective of the real world. For example, a “sound space” can bevisualized as a rough representation of any physical space containingcells. Sound objects can be placed in the sound space visualization andthe sound will be correspondingly rendered by the cells. This can beused, for example, to generate an ambient soundscape just as, but notlimited to, a city or jungle. The ambient jungle can be enhanced byplacing objects in the sound space corresponding to monkeys in the soundspace on the floor of the jungle, or birds in the canopies of trees,which in turn can be rendered in the soundscape. In many embodiments, AIcan be attached to placed objects to guide their natural movements. Forexample, a bird may hunt for bugs that are active in one region of thesound space, or bird seed could be placed to draw birds from the area.Any number of ambient environments and objects can be created usingsound spaces. Indeed, sound spaces do not in fact have to be ambient.For example, instruments or functional directional alerts or beacons forguidance can be placed within a sound space and rendered in a soundscapefor audio production, home safety, and/or any other application asappropriate to the requirements of specific applications of embodimentsof the invention. As can readily be appreciated, sound spaces providegreat opportunities for creativity and are not limited in any way to theexamples recited herein, but are largely only limited by the imaginationand creativity of the designers of the sound space.

In many embodiments, a playback and/or control device can be used toplayback video content. In numerous embodiments, video content isaccompanied by spatial audio. In many cases, a playback and/or controldevice might be static, e.g. a television mounted on a wall or otherwisein a static location. As described above, spatial audio systems canrender spatial audio relative to the playback and/or control device.However, in a variety of embodiments, playback and/or control devicesare mobile and can include (but are not limited to) tablet computers,cell phones, portable game consoles, head mounted displays, and/or anyother portable playback and/or control device as appropriate to therequirements of specific applications. In many embodiments, spatialaudio systems can adaptively render spatial audio relative to themovement and/or orientation of a portable playback and/or controldevice. When a playback and/or control device contains an inertialmeasurement unit, such as, but not limited to, gyroscopes,accelerometers, and/or any other positioning system capable of measuringorientation and/or movement, orientation and/or movement information canbe used to track the device in order to modify the rendering of thespatial audio. It should be appreciated that spatial audio systems arenot restricted to using gyroscopes, accelerometers, and/or otherintegrated positioning systems. In many embodiments, positioning systemscan further include machine vision based tracking systems, and/or anyother tracking system as appropriate to the requirements of specificapplications of various embodiments of the invention. In someembodiments, the location of the user can be tracked and used to refinethe relative rendering of the spatial audio.

As noted above, spatial audio systems in accordance with certainembodiments of the invention provide user interfaces via mobile devicesand/or other computing devices that enable placement of audio objects.In a number of embodiments of the invention, the user interface canenable the coordinated movement of all audio objects or a subset ofaudio objects in a coordinated manner (rotation around an origin isoften referred to as a wave pinning). Turning now to FIG. 43 , a UIprovided by a mobile device including affordances enabling wave pinningin accordance with an embodiment of the invention is illustrated. As canreadily be appreciated, spatial audio systems in accordance with variousembodiments of the invention can also support spatial audio rendering ina manner that supports the coordinated translation and/or other forms ofmovement of multiple spatial audio objects and can provide U Isaccordingly.

In addition to enabling placement of multiple audio objects via a UI,spatial audio systems in accordance with many embodiments of theinvention can also enable placement of multiple spatial audio objectsbased upon the tracked movement of one or more users and/or userdevices. Turning now to FIG. 44 , a series of UI screens are illustratedin which movement of spatial audio objects relative to the locations ofthree cells is tracked using inertial measurements made by a userdevice. As noted above, any of a variety of tracking techniques can beutilized to generate telemetry data that can be provided to a spatialaudio system to cause audio objects to move with or in response tomovements of a user and/or a user device.

While a number of different U Is are described above, these UIs areincluded for illustrative purposes only and do not in any way constitutethe full scope of potential UI configurations. Indeed, an extensivearray of UI modalities can be utilized to control the functionality ofspatial audio systems configured in accordance with various embodimentsof the invention. The specific UIs provided by spatial audio systemswill typically depend upon the user input modalities supported by thespatial audio system and/or user devices that communicate with thespatial audio system and/or the capabilities provided by the spatialaudio system to control spatial audio reproduction.

Although specific systems and methods for rendering spatial audio arediscussed above, many different fabrication methods can be implementedin accordance with many different embodiments of the invention. It istherefore to be understood that the present invention may be practicedin ways other than specifically described, without departing from thescope and spirit of the present invention. Thus, embodiments of thepresent invention should be considered in all respects as illustrativeand not restrictive. Accordingly, the scope of the invention should bedetermined not by the embodiments illustrated, but by the appendedclaims and their equivalents.

What is claimed is:
 1. A spatial audio system, comprising: a pluralityof communicatively coupled loudspeakers, where each loudspeaker in theplurality of communicatively coupled loudspeakers comprises: at leastthree, equally spaced, co-planar horns arranged in a circle such thatthe firing direction of the at least three horns points outward from thecenter of the circle, where each horn is coupled with at least onedriver; where a given loudspeaker in the plurality of communicativelycoupled loudspeakers is configured to: receive an input audio stream;generate at least one derived audio stream from the input audio streamfor each loudspeaker in the plurality of communicatively coupledloudspeakers; and transmit the at least one derived audio stream to eachrespective loudspeaker in the plurality of communicatively coupledloudspeakers; where each loudspeaker in the plurality of communicativelycoupled loudspeakers is configured to playback its respective at leastone derived audio stream using its respective horns in order to renderspatial audio; and a user interface device capable of selecting theaudio stream, and directing the given loudspeaker to generate the atleast one derived audio stream for each loudspeaker in the plurality ofcommunicatively coupled loudspeakers such that the rendered spatialaudio comprises spatial audio objects at locations indicated by agraphical user interface provided by the user interface device.
 2. Thespatial audio system of claim 1, wherein each loudspeaker in theplurality of loudspeakers is further configured to generate a differentdriver signal from the at least one derived audio stream for eachdriver.
 3. The spatial audio system of claim 1, wherein the renderedspatial audio emulates surround sound.
 4. The spatial audio system ofclaim 3, wherein the emulated surround sound conforms to a specificnumber of channels; and wherein the number of loudspeakers in theplurality of communicatively coupled loudspeakers is less than thespecific number of channels.
 5. The spatial audio system of claim 1,wherein the input audio stream is an audio file.
 6. The spatial audiosystem of claim 1, wherein each horn is coupled with a mid-range driverand a tweeter.
 7. The spatial audio system of claim 1, wherein theplurality of communicatively coupled loudspeakers comprises threeloudspeakers arranged in a triangle.
 8. A method for rendering spatialaudio, comprising: receiving an input audio stream using a givenloudspeaker in a plurality of communicatively coupled loudspeakers,where each loudspeaker in the plurality of communicatively coupledloudspeakers comprises at least three, equally spaced, co-planar hornsarranged in a circle such that the firing direction of the at leastthree horns points outward from the center of the circle, where eachhorn is coupled with at least one driver; generating at least onederived audio stream from the input audio stream for each loudspeaker inthe plurality of communicatively coupled loudspeakers using the givenloudspeaker; transmitting the at least one derived audio stream to eachrespective loudspeaker in the plurality of communicatively coupledloudspeakers from the given loudspeaker; rendering spatial audio usingthe plurality of communicatively coupled loudspeakers by playing backthe respective at least one derived audio stream for each loudspeaker inthe plurality of communicatively coupled loudspeakers using itsrespective horns; providing a graphical user interface capable ofselecting the input audio stream; and directing the given loudspeaker togenerate the at least one derived audio stream for each loudspeaker inthe plurality of communicatively coupled loudspeakers such that therendered spatial audio comprises spatial audio objects at locationsindicated by the graphical user interface.
 9. The method for renderingspatial audio of claim 8, further comprising generating, for eachspecific loudspeaker in the plurality of communicatively coupledloudspeakers, a different driver signal from the at least one derivedaudio stream for each driver of the specific loudspeaker.
 10. The methodfor rendering spatial audio of claim 8, wherein the rendered spatialaudio emulates surround sound.
 11. The method for rendering spatialaudio of claim 10, wherein the emulated surround sound conforms to aspecific number of channels; and wherein the number of loudspeakers inthe plurality of communicatively coupled loudspeakers is less than thespecific number of channels.
 12. The method for rendering spatial audioof claim 8, wherein the input audio stream is an audio file.
 13. Themethod for rendering spatial audio of claim 8, wherein each horn iscoupled with a mid-range driver and a tweeter.
 14. The method forrendering spatial audio of claim 8, wherein the plurality ofcommunicatively coupled loudspeakers comprises three loudspeakersarranged in a triangle.